FTP delete not atomic / verified / transactional (?)

I know there are interruptions, there’s no need to especially look for those. My point still is that journal & journal recovery should take care of the “problems” interruption created when things resume as usual.

And as we concluded in the another thread, this problem is not FTP related, it seems to be generic flaw with compact & journal process. FTP (AFTP) itself seems to be working exactly as intended and designed.

Anyway, is there any soft way to ask for duplicati to stop? As far as I remember, it doesn’t mind friendly hup and requires hard kill… Is there any other way to signal please stop as soon as it’s safe.

I also have noticed two interesting aspects lately. For some reason Duplicati seems to leave one idle FTPS session during the backup operations. Yet, it’s closed when Duplicati finishes, so it’s certainly not a problem, just bit strange. I haven’t concluded why that happens. Maybe it’s used only for the main directory content checks. Because it opens FTPS session, and then uploads each file in separate session without closing the first session.

I also saw this in logs, which is quite interesting. Does the FTPS client failure to recognize that session is already active and logged in?

FluentFTP.FtpAuthenticationException: Already logged in. QUIT first.
   at Duplicati.Library.Main.AsyncDownloader.AsyncDownloaderEnumerator.AsyncDownloadedFile.get_TempFile()
   at Duplicati.Library.Main.Operation.CompactHandler.DoCompact(LocalDeleteDatabase db, Boolean hasVerifiedBackend, IDbTransaction& transaction, BackendManager sharedBackend)
   at Duplicati.Library.Main.Operation.CompactHandler.Run()
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Compact()
   at Duplicati.CommandLine.Commands.Compact(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

In the GUI, you use the Stop button. Safest is “Stop after current file”. Newer way is so-called “Stop now” which doesn’t really mean right-now because in either case the files that are in progress finish uploading.

If you’re on the command-line client (and I think you are), then I know of no equivalent way to ask nicely, although it’s conceivable that the third-party Duplicati Client abort command is better than a process kill.

Process kill can be done using different signals, sending ctrl-c, sigterm, sigkill and so on. But again, correctly designed and implemented program should also deal with hard kill without problems.
taskkill can be used also with or without /force, and without the force it just tells program to quit as soon as it suits the program. As example with compact, I would assume it would mean almost immediately by rolling back latest changes (a few seconds maybe depending on remote), or bit slower, by finishing uploads and running deleted, but before starting next download cycle. Yet this can take quite a while, could be seconds to several minutes.

I agree, but let me quote myself from yesterday…

The posts are repeating and going in circles. One major block of dialog was from August 2021 above.
I’m not going to post the whole enormous conversation again, but I’ll make a couple of points instead.

If anyone wants a random kill tester to try, you’re welcome to my perl script which uses taskkill /f.
This is part of the “test” work where I keep asking for volunteers (also developers, writers, and more).
The theory is that testing is more easily done by less experienced people, who learn as they do tests.

The goal is to run with enough logging and debugging so as to figure out a step-by-step for developer investigation (file an issue on it, with steps). Then we need more developers capable of coding the fix.
This has been a gap lately. My own well-described interrupted compact issue is not fixed in two years.

Duplicati is a community effort, reliant on volunteers for all of its improvements, and its very existence.
Anybody should be capable of doing something positive, even if it’s occasional forum help. Please do.

1 Like

Today I saw three similar failures, I mean that Duplicati first deleted files and then says that THE FILES are missing. Well, what did you expect that would follow from deleting files? Very systemically, logically and obviously broken.

And this has absolutely nothing to do with backend durability. Interestingly all the files missing reported are dindex files.

Backup started at 30.3.2022 8.22.45
Checking remote backup ...
  Listing remote folder ...
Missing file: duplicati-**i6b91a0fac8524e8cbff58e80c2a9df64**.dindex.zip.aes
Missing file: duplicati-ib8905cf1660142558fc9c4b66aa08207.dindex.zip.aes
Missing file: duplicati-ia056ba6143b542d58e4a1df96011694d.dindex.zip.aes
Missing file: duplicati-i632c7a10e43d424a826cd6611fd51ed8.dindex.zip.aes
Missing file: duplicati-i73c682c718be4ef4aca4d6ddb669f4f4.dindex.zip.aes
Missing file: duplicati-ie5c2ff0c09684da6acb9dc98e7d6b72c.dindex.zip.aes
Missing file: duplicati-if2b8627f93504d25a719927c8605088b.dindex.zip.aes
Found 7 files that are missing from the remote storage, please run repair

And before that dupliati crashed with error message:

FluentFTP.FtpAuthenticationException: Already logged in. QUIT first.
   at Duplicati.Library.Main.AsyncDownloader.AsyncDownloaderEnumerator.AsyncDownloadedFile.get_TempFile()
   at Duplicati.Library.Main.Operation.CompactHandler.DoCompact(LocalDeleteDatabase db, Boolean hasVerifiedBackend, IDbTransaction& transaction, BackendManager sharedBackend)
   at Duplicati.Library.Main.Operation.CompactHandler.Run()
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Compact()
   at Duplicati.CommandLine.Commands.Compact(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

Crash isn’t technically hard kill, but recovery can follow exactly the same protocol.

And before the crash, duplicati said, which almost makes me laugh…

  Deleting file duplicati-b58b6e364501b4225b449007843aa24dc.dblock.zip.aes (32,00 MT) ...
  Deleting file duplicati-**i6b91a0fac8524e8cbff58e80c2a9df64**.dindex.zip.aes (1,75 MT) ...
  Deleting file duplicati-b086a7738ee5644e8a0f76db19be4f409.dblock.zip.aes (24,33 MT) ...
  Deleting file duplicati-ib8905cf1660142558fc9c4b66aa08207.dindex.zip.aes (2,47 MT) ...
  Deleting file duplicati-bb8eedf9d1aac427e8fd33027c491189e.dblock.zip.aes (3,28 MT) ...
  Deleting file duplicati-ia056ba6143b542d58e4a1df96011694d.dindex.zip.aes (352,50 KT) ...
  Deleting file duplicati-b8c9a6ccf684449afbfb8525f3cb72632.dblock.zip.aes (2,51 MT) ...
  Deleting file duplicati-i632c7a10e43d424a826cd6611fd51ed8.dindex.zip.aes (347,67 KT) ...
  Deleting file duplicati-b35bdde309cd14959b6fc7396de9f4739.dblock.zip.aes (3,04 MT) ...
  Deleting file duplicati-i73c682c718be4ef4aca4d6ddb669f4f4.dindex.zip.aes (338,78 KT) ...
  Deleting file duplicati-b7c0b93b414f144a2b169b47cba35207c.dblock.zip.aes (2,84 MT) ...
  Deleting file duplicati-ie5c2ff0c09684da6acb9dc98e7d6b72c.dindex.zip.aes (350,29 KT) ...
  Deleting file duplicati-bb6c5652dd17641f8bd5033802560f1e8.dblock.zip.aes (3,30 MT) ...
  Deleting file duplicati-if2b8627f93504d25a719927c8605088b.dindex.zip.aes (317,39 KT) ...

Which actually proves all of my previous points right. Journal log isn’t working, isn’t being handled correctly, and or the recovery fails. + on top of that there are other code smells, which cause secondary problems. Like why the database rebuild fails… All are now 100% confirmed and proven issues.

Ok, nothing new here. Classic problems, I’ve seen all the similar failures with other projects and programs (including my own) countless times. Also usually these are problems like, ffffff sssss grauh, some rage and then frantic fixing, bit extra logging (of the transaction handling) adding few test cases and then the problem is gone forever.

** Stars with filenames are just to emphasize the filenames in log entry, it doesn’t really exist.

This is going in circles and needs information. There’s an open issue. Is below what you saw?

Please keep the above rather-user-friendly description in mind for below and your own checks.

Was that immediately before (missing file notices at next backup start)? Then you hit that issue.
Probably. The smoking gun is to get a database bug report that is missing rows that rolled back,
or open a DB copy in DB Browser for SQLite and look at RemoteOperation and Remotevolume.

When using terminology, please try to use in usual way. Once when you wrote “job log”, I thought
Viewing the log files of a backup job is what you meant. Technically, below is what a journal log is:

Temporary Files Used By SQLite (there are many journals related to atomic commit and rollback)
Atomic Commit In SQLite
Transaction (which is an SQL concept, so trying to keep destination and database aligned is DIY)
Write-Ahead Logging is not yet used by Duplicati (a pull request exists). It uses a rollback journal.

There are some things Duplicati controls, and some it doesn’t. SQL lets it do commit and rollback.
Journal logs are under the covers. I suspect you don’t mean DB journal log, just Duplicati records.
See my above remark on Remotevolume table, which is the (wrong?) record of what should exist.

Ideally, you figure out steps to reproduce, and write up an issue (I went over all of the yesterday).
But in this case there is a known issue with steps to reproduce, so you can see if it fits your case.

Does it? If so, you have an issue that’s in the issue backlog along with a whole lot of other issues.

If you can demonstrate that this is actually what you hit, that raises the priority for developer work, however (as explained yesterday and seemingly daily), all progress depends on our volunteers…

Please be gentle on those trying to help. Rough language on people or their code drives them off.
This is a Civilized Place for Public Discussion (or so I hope, and there is more good advice there).

@ts678

The smoking gun is to get a database bug report that is missing rows that rolled back,
or open a DB copy in DB Browser for SQLite and look at RemoteOperation and Remotevolume.

I dropped one database in “error state” to you guys… I’ve got my own programs, source, processes and platform problems that I need to take care of. So not going to split the effort, but if this helps, it would be great.

I’m very well aware how SQLite3 works, I’m using it on many production projects. And it’s extremely rare that there’s a real database problem. Exactly same applies to PostgreSQL and Microsoft SQL Server. If things do not work, it’s usually something else than database problem. Ahem, and exactly same applies to the NTFS. Sure there are filesystem issues, but it’s quite unlikely that NTFS or ext4 would have a problem which would cause this many problems.

Steps to reproduce, seems that if the program aborts with the file transfer excepetion, then logs aren’t correctly updated. So with that process alone there are multiple overlapping code smells. Yet, it’s unfortunately true, that all the so much fun bugs, aren’t usually caused by a single factor, but combination of 3 - 5 different failures and then it starts to be fun. Especially if it happens extremely rarely. Like once a year, but it causes catastrophic damage / results.

I gave a long analysis via PM (and am still trying to find out what to share in the clear), but summary:

is actually not a log (SQLite or otherwise), but the State field in Remotevolume table for these dindex.
The talk of that started roughly here in the open issue I’ve pointed to. There’s a workaround proposed.
Or maybe it’s the best-and-proper solution. Someone with more expertise in this area should weigh in.

Anyway, after looking at a writeup of what the text log-file had versus what the database had, it looked very much like the known open issue. Operation table showed a backup, a compact, but later backups simply did list operation (RemoteOperation table), suggesting they stopped on pre-backup list check.

Looking at what list returned (Data column) had 7 fewer dindex than Remotevolume had Verified.
Text log shows 14 delete (7 dblock and 7 dindex) that aren’t in the RemoteOperation table, but dblock didn’t get missing file complaints unlike the 7 dindex that did. What probably happened is that cleanup

at start of next backup recovered from the interruption by deleting dblock files which had State set to Deleting, whereas dindex files were still at State of Verified so were expected to still be in a list.
Proposed patch (from 2020) set dindex to Deleting too, but it needs expert review before going in…

Even if this is not the best solution (or maybe it is), now that the issue is showing up, it’s time to move. Developers with C# and SQL skills have been scarce for awhile, but one person has recently begun…

Note, I just confirmed this first with the latest version. I haven’t updated myself with the latest posts in this thread yet… I’ll do it now, and probably edit add to end of this post, or separate post if longer comments.

It seems that this problem has been getting less and less problem. But it’s still systemically broken, but not much. Hopefully this is the last (?) serious issue (?).

Just confirmed that this problem still exists with latest version:

Backup started at 25.4.2022 9.03.57
Checking remote backup ...
  Listing remote folder ...
Missing file: duplicati-i21862c88272342a1833f36b2abbf43d0.dindex.zip.aes
Missing file: duplicati-i251b4e38a06648a79c7511e4da22bb9e.dindex.zip.aes
Missing file: duplicati-i4886e4c2e5034d7a8ee45a240f2595b4.dindex.zip.aes
Missing file: duplicati-i6005188a35a643c8bad3e7e1434ce585.dindex.zip.aes
Missing file: duplicati-i8998fd2e8ddc4b8db268d9ba916dad71.dindex.zip.aes
Missing file: duplicati-i3920cc35c1f04b4e9ab5a93046e1dadb.dindex.zip.aes
Missing file: duplicati-i24678ba53c9d4b0686a50a2cf2281b8c.dindex.zip.aes
Missing file: duplicati-i3813ead0c82a4eb1a653a12036cefadc.dindex.zip.aes
Missing file: duplicati-ieee39330f1c24f26878736bcc940ddb6.dindex.zip.aes
Missing file: duplicati-i1564005832344ba9a8dae947767b884f.dindex.zip.aes
Found 10 files that are missing from the remote storage, please run repair
Backend verification failed, attempting automatic cleanup => Found 10 files that are missing from the remote storage, please run repair
Failed to read local db ...\backup.sqlite, error: database is locked
database is locked => database is locked
database is locked
Fatal error => The process cannot access the file because it is being used by another process.

This is the case where it should recover using journal. But now it probably (hopefully!) tries to do that, but there’s another uncommitted transaction with different database connection blocking the action.

What did lead to this situation? Well the classic, compaction was started, and ever since the backup hasn’t been running due to systemic logic error.


Separate but small thing, I’m wondering why this triggers a warning and doesn’t work “as usual”.

Hi, I’m wondering why the restore gives this warning? It’s already creating bunch of missing folders, why only one of the folders gives warning?
Downloading restoring …
Creating missing folder D:\restore_point\ for file D:\restore_point\some_path\something.dat
Downloading restoring …
Exit code: 2

So if it isn’t a problem, why it’s causing exit level to change from 0?

Anyway, as backup source, that file is referenced by name, instead of path alone. But it shouldn’t cause warnings / errors during restore, afaik.

Was it interrupted? Got logs? That’s the known way I’ve been linking to that leads to this.
That one is easily fixed by repair, but if the repair is via the --auto-cleanup, it trips itself…

Database is locked when performing auto-cleanup #4631

Yes, it was interrupted. Which logs, the interruption itself isn’t particularly helpful. It just ends there. But I’m sure I’m going to see these error messages repeatedly in future, as I have been seeing the past. I can run debug run / provide database when yet another instance is in the same classic broken state. Before repairing, actually I took copy of the database and data before repair, yet it’s obvious that the repair just added the 541 byte empty files. I think we’ve discussed this matter over and over again, during past few years.

And as I guessed, I’ve got at least two instances right now in that error state again. → Because the program is systemically and logically broken. Yep, actually the number is three. Duh!

Off-topic, just in another project around one minute obvious bug fix too one week of more or less discussion in circles. Phew.

I’ve got debug (profiling) log, I can submit it privately, just in case it contains something not to be published. Sending … (edit) sent …

After this trivial (?) issue is fixed, then I think there’s only one serious issue to fix:

Btw. If debug log for this restore failing when everything is “good” I can also send debug log(s) obviously…

Log began days later, so was less interesting. Log of compact and backup after it would be better.
To the extent that information was given , this looks like the usual problem of interrupted compact.

It’s possible that a simple patch will take care of it, but if things are truly “systemically” broken, then
sorting that out would be better (but would take a whole lot more time and expertise than I have…).

Since this open issue is over two years old, are there any new developer-volunteers who will help?

Duplicati only progresses thanks to its volunteers in code, test, docs, forum, and many other areas.
Right now, help is short in many areas. Any community members who can do any thing, please do.

Based on the latest information on the forum, and suitably small backup set I confirmed this again and now in steps, with restore test between each step.

  1. Compact fails - Restore still works
  2. Backup stops working - Restore still works
  3. Run repair - Restore still works
  4. Run compact again - Now backup is broken.

So it’s exactly like I said in the very first post. Systemic logical error which makes the backups unrestorable.

It’s fixed when it’s fixed. Acknowledged. And it’s exactly as bad trap as I said it would be.

But it’s still interesting why test claims it’s not broken.

Can you explain a bit more about the exact steps you do while testing a restore ? I have ranted here about the best way to use restore to test a backup:

in other words, testing a restore while having the live, original data available can be very misleading as Duplicati steals missing blocks from the live data unless specifically ordered otherwise.

1 Like

Exact steps for anything would help. If an easily reproduced case is found, that would be wonderful.
Steps 3 and 4 seem to say restore test worked, and compact ran, but restore test failed somehow.
Right now, there’s not even an error message given about final failure. Sadly, nothing to work with…

So this is finally getting fixed? Great (!) - Please check all cases when stuff is getting deleted.

Yet afaik, this still doesn’t fix the problem with database rebuilding. If there are some old junk files in the path, it shouldn’t completely prevent rebuild & restore. As far as I remember it’s also two tiered problem. A) First leave junk B) Get confused by the junk.

When those are fixed, then I would say that Duplicati is “ready”, because well known corruption flaws are fixed as far as I know.

I haven’t seen any other problems than those. And I see those repeatedly.

Just a minor wish. If a single file is added to backup set, instead of directory. The restore process gives Code: 2 always, because it created “a missing directory”. Well, I don’t see that as being a reason to change code or report “error” at all.

Then perhaps you can point to an issue or describe it better than

Perhaps you are seeing the problem 4485 above here involving index files that the database forgot?
That now has steps to reproduce, and is awaiting some developer to actually investigate and fix that.
If it’s something else, please describe it, or preferably point to existing forum topics or GitHub issues.

I really love this thread about stopnow() (and related questions & issues) - Some of my problems with Duplicati are for sure caused by hard kill. But to be honest, the only case where I see it creating real problems is hard kill while compacting. I’ve got so much data, that I’m not actually worried about the other hard kill / stopnow problems. It seems that those are at least somewhat adequately cleaned up. Unless that’s exactly what’s creating the junk, causing the compaction to fail later. - Keep up the great work. Now I’m seeing deep dive into the topics I’ve been seeing for years! - And to be honest, I’ve made no effort whatsoever to run Duplicati in a timeframe where it couldn’t be stopped (hardkilled) by something else. That’s where journal rollback should happen, and state should be restored on next start. And as far as I know, it works if the kill happens during backup.

To my statistical exprience, it’s not nearly as bad as it could be. And it doesn’t require much, to fix the last (?) major issue with data corruption. It could be just two lines in wrong order or one line missing, etc. That’s my experience with transaction handling, only one thing wrong is all that it takes. Didn’t someone already catch deleting remote files operation that wasn’t written into the journal? → Making proper recovery after stop automatically impossible.

These are just me opinions and experiences, I’ve got a lot of small to medium duplicati backup sets which are updated several times a day, and restore tested regularly.

If possible signals sigint / sigterm should of course make stop as soon as cleanly possible. But also recovery from hardkill / sigkill / usb-drive / system power cable disconnected abruptly should work.

Also, about not optimizing the restore. I’m quite happy with restore speeds, as long as, the backup isn’t corrupted and it doesn’t try to “recover blocks” or whatever it was called, which then is insanely slow. I’m not sure if all people compalining about restore speed, have recognized the difference between these issues. There are several tiers, restore, rebuild database and finally the block recovery attempt. I’m also happy with the rebuild database speed, as long as the recovery step won’t happen.

And if we want to go to very minor issues. It seems that the FTP leaves the last keepalived connection to idle until it times out. Proper open, use, close cycle isn’t followed with FTP.

To sum it up, thank you guys! And I’m really happy to see Kenneth @kenkendk commenting things around here.

Anyway, situation seems better. And I have to say, that the title of this thread is totally misleading. Because I initially thought that the problems would have been caused by bad FTP library, which wouldn’t return correct state information back to process calling it.

Most importantly, if there’s will to do fixes, we’re not that far off from reliable operation.

When I said this, I’ll run full restore tests again, starting right now.