FTP delete not atomic / verified / transactional (?)

Let me see if I can puzzle out what “lost” means, or please clarify rather than throwing it in double quotes.

In this topic, manifest files suddenly jumped in yesterday. Manifest files are never standalone files deleted from destination the way a dblock, dindex, or dlist file would be. They’re a file included in all such file types. The other topic where you mentioned manifest also showed one here without any explanation about “lost”.

Sorry, I have no idea why you’re pointing to manifest files. Do you see a manifest file problem or a problem with some other type of file? Got error messages? Got logs? Got the time history of what happened when?

Repair should never recreate a manifest file without a containing file, and compact should never delete one. These are internal files. Have you found a problem at the file internals level, or is it a dblock/dindex problem which seems more likely based on the known issues? You can view a log file (live log is fine) for file names.

Repair also has a limited ability to recreate destination files. It can do dindex and dlist, but dblock is tougher.

If you kept a better log file, look in it. See if you see a complaint with “Missing file” in its text. If there is some history from the previous backup, post that to show how it ended. Issue might be found on the next backup.

If you see a manifest file named in your log, please post it. I don’t think you will though. They’re internal files.

Have you been leaving the problem system alone or already done things to it? A look by you (or bug report) would have been nice. If system has been left alone, please copy the DB or get a DB bug report snapshot. Saved log files are a more reliable view, but go well with a DB snapshot to look for inconsistencies such as dindex file actual delete that the DB then forgets about by rollback. That open issue was referenced before.

Confirmed with latest canary version. 100% obvious bug:

– Server side log –
File was first added

2022-02-24T23:24:57.136Z >> [FTP Session 7226] SIZE /duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes
2022-02-24T23:24:57.214Z >> [FTP Session 7226] STOR duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes
2022-02-24T23:24:57.870Z >> [FTP Session 7226] SIZE duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes

Then compaction deleted the file
2022-03-14T00:43:45.826Z >> [FTP Session 8959] DELE duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes
Then Duplicati hanged due to “missing files”, which is obviously a bug…
Then repair fixed the issue by uploading index file back…

2022-03-25T09:42:34.825Z >> [FTP Session 8295] SIZE /duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes
2022-03-25T09:42:34.965Z >> [FTP Session 8295] STOR duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes
2022-03-25T09:42:35.028Z >> [FTP Session 8295] SIZE duplicati-id32b2f754a1a4aa2bc1019c3a7962731.dindex.zip.aes

Very clearly broken by bad implementation, not a back-end durability issue.

Edit continued:

Checked more logs, there are actually similar problem with multiple backup sets as well. Which is well, just more proof to the fact that it’s logically and systemically broken.

Similar to below, but does the timing fit, meaning was the “missing files” just after compact?
You can also check job logs, system logs, maybe FTP logs, to try to look for an interruption.
Error during compact forgot a dindex file deletion, getting Missing file error next run. #4129

I know there are interruptions, there’s no need to especially look for those. My point still is that journal & journal recovery should take care of the “problems” interruption created when things resume as usual.

And as we concluded in the another thread, this problem is not FTP related, it seems to be generic flaw with compact & journal process. FTP (AFTP) itself seems to be working exactly as intended and designed.

Anyway, is there any soft way to ask for duplicati to stop? As far as I remember, it doesn’t mind friendly hup and requires hard kill… Is there any other way to signal please stop as soon as it’s safe.

I also have noticed two interesting aspects lately. For some reason Duplicati seems to leave one idle FTPS session during the backup operations. Yet, it’s closed when Duplicati finishes, so it’s certainly not a problem, just bit strange. I haven’t concluded why that happens. Maybe it’s used only for the main directory content checks. Because it opens FTPS session, and then uploads each file in separate session without closing the first session.

I also saw this in logs, which is quite interesting. Does the FTPS client failure to recognize that session is already active and logged in?

FluentFTP.FtpAuthenticationException: Already logged in. QUIT first.
   at Duplicati.Library.Main.AsyncDownloader.AsyncDownloaderEnumerator.AsyncDownloadedFile.get_TempFile()
   at Duplicati.Library.Main.Operation.CompactHandler.DoCompact(LocalDeleteDatabase db, Boolean hasVerifiedBackend, IDbTransaction& transaction, BackendManager sharedBackend)
   at Duplicati.Library.Main.Operation.CompactHandler.Run()
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Compact()
   at Duplicati.CommandLine.Commands.Compact(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

In the GUI, you use the Stop button. Safest is “Stop after current file”. Newer way is so-called “Stop now” which doesn’t really mean right-now because in either case the files that are in progress finish uploading.

If you’re on the command-line client (and I think you are), then I know of no equivalent way to ask nicely, although it’s conceivable that the third-party Duplicati Client abort command is better than a process kill.

Process kill can be done using different signals, sending ctrl-c, sigterm, sigkill and so on. But again, correctly designed and implemented program should also deal with hard kill without problems.
taskkill can be used also with or without /force, and without the force it just tells program to quit as soon as it suits the program. As example with compact, I would assume it would mean almost immediately by rolling back latest changes (a few seconds maybe depending on remote), or bit slower, by finishing uploads and running deleted, but before starting next download cycle. Yet this can take quite a while, could be seconds to several minutes.

I agree, but let me quote myself from yesterday…

The posts are repeating and going in circles. One major block of dialog was from August 2021 above.
I’m not going to post the whole enormous conversation again, but I’ll make a couple of points instead.

If anyone wants a random kill tester to try, you’re welcome to my perl script which uses taskkill /f.
This is part of the “test” work where I keep asking for volunteers (also developers, writers, and more).
The theory is that testing is more easily done by less experienced people, who learn as they do tests.

The goal is to run with enough logging and debugging so as to figure out a step-by-step for developer investigation (file an issue on it, with steps). Then we need more developers capable of coding the fix.
This has been a gap lately. My own well-described interrupted compact issue is not fixed in two years.

Duplicati is a community effort, reliant on volunteers for all of its improvements, and its very existence.
Anybody should be capable of doing something positive, even if it’s occasional forum help. Please do.

1 Like

Today I saw three similar failures, I mean that Duplicati first deleted files and then says that THE FILES are missing. Well, what did you expect that would follow from deleting files? Very systemically, logically and obviously broken.

And this has absolutely nothing to do with backend durability. Interestingly all the files missing reported are dindex files.

Backup started at 30.3.2022 8.22.45
Checking remote backup ...
  Listing remote folder ...
Missing file: duplicati-**i6b91a0fac8524e8cbff58e80c2a9df64**.dindex.zip.aes
Missing file: duplicati-ib8905cf1660142558fc9c4b66aa08207.dindex.zip.aes
Missing file: duplicati-ia056ba6143b542d58e4a1df96011694d.dindex.zip.aes
Missing file: duplicati-i632c7a10e43d424a826cd6611fd51ed8.dindex.zip.aes
Missing file: duplicati-i73c682c718be4ef4aca4d6ddb669f4f4.dindex.zip.aes
Missing file: duplicati-ie5c2ff0c09684da6acb9dc98e7d6b72c.dindex.zip.aes
Missing file: duplicati-if2b8627f93504d25a719927c8605088b.dindex.zip.aes
Found 7 files that are missing from the remote storage, please run repair

And before that dupliati crashed with error message:

FluentFTP.FtpAuthenticationException: Already logged in. QUIT first.
   at Duplicati.Library.Main.AsyncDownloader.AsyncDownloaderEnumerator.AsyncDownloadedFile.get_TempFile()
   at Duplicati.Library.Main.Operation.CompactHandler.DoCompact(LocalDeleteDatabase db, Boolean hasVerifiedBackend, IDbTransaction& transaction, BackendManager sharedBackend)
   at Duplicati.Library.Main.Operation.CompactHandler.Run()
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Compact()
   at Duplicati.CommandLine.Commands.Compact(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

Crash isn’t technically hard kill, but recovery can follow exactly the same protocol.

And before the crash, duplicati said, which almost makes me laugh…

  Deleting file duplicati-b58b6e364501b4225b449007843aa24dc.dblock.zip.aes (32,00 MT) ...
  Deleting file duplicati-**i6b91a0fac8524e8cbff58e80c2a9df64**.dindex.zip.aes (1,75 MT) ...
  Deleting file duplicati-b086a7738ee5644e8a0f76db19be4f409.dblock.zip.aes (24,33 MT) ...
  Deleting file duplicati-ib8905cf1660142558fc9c4b66aa08207.dindex.zip.aes (2,47 MT) ...
  Deleting file duplicati-bb8eedf9d1aac427e8fd33027c491189e.dblock.zip.aes (3,28 MT) ...
  Deleting file duplicati-ia056ba6143b542d58e4a1df96011694d.dindex.zip.aes (352,50 KT) ...
  Deleting file duplicati-b8c9a6ccf684449afbfb8525f3cb72632.dblock.zip.aes (2,51 MT) ...
  Deleting file duplicati-i632c7a10e43d424a826cd6611fd51ed8.dindex.zip.aes (347,67 KT) ...
  Deleting file duplicati-b35bdde309cd14959b6fc7396de9f4739.dblock.zip.aes (3,04 MT) ...
  Deleting file duplicati-i73c682c718be4ef4aca4d6ddb669f4f4.dindex.zip.aes (338,78 KT) ...
  Deleting file duplicati-b7c0b93b414f144a2b169b47cba35207c.dblock.zip.aes (2,84 MT) ...
  Deleting file duplicati-ie5c2ff0c09684da6acb9dc98e7d6b72c.dindex.zip.aes (350,29 KT) ...
  Deleting file duplicati-bb6c5652dd17641f8bd5033802560f1e8.dblock.zip.aes (3,30 MT) ...
  Deleting file duplicati-if2b8627f93504d25a719927c8605088b.dindex.zip.aes (317,39 KT) ...

Which actually proves all of my previous points right. Journal log isn’t working, isn’t being handled correctly, and or the recovery fails. + on top of that there are other code smells, which cause secondary problems. Like why the database rebuild fails… All are now 100% confirmed and proven issues.

Ok, nothing new here. Classic problems, I’ve seen all the similar failures with other projects and programs (including my own) countless times. Also usually these are problems like, ffffff sssss grauh, some rage and then frantic fixing, bit extra logging (of the transaction handling) adding few test cases and then the problem is gone forever.

** Stars with filenames are just to emphasize the filenames in log entry, it doesn’t really exist.

This is going in circles and needs information. There’s an open issue. Is below what you saw?

Please keep the above rather-user-friendly description in mind for below and your own checks.

Was that immediately before (missing file notices at next backup start)? Then you hit that issue.
Probably. The smoking gun is to get a database bug report that is missing rows that rolled back,
or open a DB copy in DB Browser for SQLite and look at RemoteOperation and Remotevolume.

When using terminology, please try to use in usual way. Once when you wrote “job log”, I thought
Viewing the log files of a backup job is what you meant. Technically, below is what a journal log is:

Temporary Files Used By SQLite (there are many journals related to atomic commit and rollback)
Atomic Commit In SQLite
Transaction (which is an SQL concept, so trying to keep destination and database aligned is DIY)
Write-Ahead Logging is not yet used by Duplicati (a pull request exists). It uses a rollback journal.

There are some things Duplicati controls, and some it doesn’t. SQL lets it do commit and rollback.
Journal logs are under the covers. I suspect you don’t mean DB journal log, just Duplicati records.
See my above remark on Remotevolume table, which is the (wrong?) record of what should exist.

Ideally, you figure out steps to reproduce, and write up an issue (I went over all of the yesterday).
But in this case there is a known issue with steps to reproduce, so you can see if it fits your case.

Does it? If so, you have an issue that’s in the issue backlog along with a whole lot of other issues.

If you can demonstrate that this is actually what you hit, that raises the priority for developer work, however (as explained yesterday and seemingly daily), all progress depends on our volunteers…

Please be gentle on those trying to help. Rough language on people or their code drives them off.
This is a Civilized Place for Public Discussion (or so I hope, and there is more good advice there).

@ts678

The smoking gun is to get a database bug report that is missing rows that rolled back,
or open a DB copy in DB Browser for SQLite and look at RemoteOperation and Remotevolume.

I dropped one database in “error state” to you guys… I’ve got my own programs, source, processes and platform problems that I need to take care of. So not going to split the effort, but if this helps, it would be great.

I’m very well aware how SQLite3 works, I’m using it on many production projects. And it’s extremely rare that there’s a real database problem. Exactly same applies to PostgreSQL and Microsoft SQL Server. If things do not work, it’s usually something else than database problem. Ahem, and exactly same applies to the NTFS. Sure there are filesystem issues, but it’s quite unlikely that NTFS or ext4 would have a problem which would cause this many problems.

Steps to reproduce, seems that if the program aborts with the file transfer excepetion, then logs aren’t correctly updated. So with that process alone there are multiple overlapping code smells. Yet, it’s unfortunately true, that all the so much fun bugs, aren’t usually caused by a single factor, but combination of 3 - 5 different failures and then it starts to be fun. Especially if it happens extremely rarely. Like once a year, but it causes catastrophic damage / results.

I gave a long analysis via PM (and am still trying to find out what to share in the clear), but summary:

is actually not a log (SQLite or otherwise), but the State field in Remotevolume table for these dindex.
The talk of that started roughly here in the open issue I’ve pointed to. There’s a workaround proposed.
Or maybe it’s the best-and-proper solution. Someone with more expertise in this area should weigh in.

Anyway, after looking at a writeup of what the text log-file had versus what the database had, it looked very much like the known open issue. Operation table showed a backup, a compact, but later backups simply did list operation (RemoteOperation table), suggesting they stopped on pre-backup list check.

Looking at what list returned (Data column) had 7 fewer dindex than Remotevolume had Verified.
Text log shows 14 delete (7 dblock and 7 dindex) that aren’t in the RemoteOperation table, but dblock didn’t get missing file complaints unlike the 7 dindex that did. What probably happened is that cleanup

at start of next backup recovered from the interruption by deleting dblock files which had State set to Deleting, whereas dindex files were still at State of Verified so were expected to still be in a list.
Proposed patch (from 2020) set dindex to Deleting too, but it needs expert review before going in…

Even if this is not the best solution (or maybe it is), now that the issue is showing up, it’s time to move. Developers with C# and SQL skills have been scarce for awhile, but one person has recently begun…

Note, I just confirmed this first with the latest version. I haven’t updated myself with the latest posts in this thread yet… I’ll do it now, and probably edit add to end of this post, or separate post if longer comments.

It seems that this problem has been getting less and less problem. But it’s still systemically broken, but not much. Hopefully this is the last (?) serious issue (?).

Just confirmed that this problem still exists with latest version:

Backup started at 25.4.2022 9.03.57
Checking remote backup ...
  Listing remote folder ...
Missing file: duplicati-i21862c88272342a1833f36b2abbf43d0.dindex.zip.aes
Missing file: duplicati-i251b4e38a06648a79c7511e4da22bb9e.dindex.zip.aes
Missing file: duplicati-i4886e4c2e5034d7a8ee45a240f2595b4.dindex.zip.aes
Missing file: duplicati-i6005188a35a643c8bad3e7e1434ce585.dindex.zip.aes
Missing file: duplicati-i8998fd2e8ddc4b8db268d9ba916dad71.dindex.zip.aes
Missing file: duplicati-i3920cc35c1f04b4e9ab5a93046e1dadb.dindex.zip.aes
Missing file: duplicati-i24678ba53c9d4b0686a50a2cf2281b8c.dindex.zip.aes
Missing file: duplicati-i3813ead0c82a4eb1a653a12036cefadc.dindex.zip.aes
Missing file: duplicati-ieee39330f1c24f26878736bcc940ddb6.dindex.zip.aes
Missing file: duplicati-i1564005832344ba9a8dae947767b884f.dindex.zip.aes
Found 10 files that are missing from the remote storage, please run repair
Backend verification failed, attempting automatic cleanup => Found 10 files that are missing from the remote storage, please run repair
Failed to read local db ...\backup.sqlite, error: database is locked
database is locked => database is locked
database is locked
Fatal error => The process cannot access the file because it is being used by another process.

This is the case where it should recover using journal. But now it probably (hopefully!) tries to do that, but there’s another uncommitted transaction with different database connection blocking the action.

What did lead to this situation? Well the classic, compaction was started, and ever since the backup hasn’t been running due to systemic logic error.


Separate but small thing, I’m wondering why this triggers a warning and doesn’t work “as usual”.

Hi, I’m wondering why the restore gives this warning? It’s already creating bunch of missing folders, why only one of the folders gives warning?
Downloading restoring …
Creating missing folder D:\restore_point\ for file D:\restore_point\some_path\something.dat
Downloading restoring …
Exit code: 2

So if it isn’t a problem, why it’s causing exit level to change from 0?

Anyway, as backup source, that file is referenced by name, instead of path alone. But it shouldn’t cause warnings / errors during restore, afaik.

Was it interrupted? Got logs? That’s the known way I’ve been linking to that leads to this.
That one is easily fixed by repair, but if the repair is via the --auto-cleanup, it trips itself…

Database is locked when performing auto-cleanup #4631

Yes, it was interrupted. Which logs, the interruption itself isn’t particularly helpful. It just ends there. But I’m sure I’m going to see these error messages repeatedly in future, as I have been seeing the past. I can run debug run / provide database when yet another instance is in the same classic broken state. Before repairing, actually I took copy of the database and data before repair, yet it’s obvious that the repair just added the 541 byte empty files. I think we’ve discussed this matter over and over again, during past few years.

And as I guessed, I’ve got at least two instances right now in that error state again. → Because the program is systemically and logically broken. Yep, actually the number is three. Duh!

Off-topic, just in another project around one minute obvious bug fix too one week of more or less discussion in circles. Phew.

I’ve got debug (profiling) log, I can submit it privately, just in case it contains something not to be published. Sending … (edit) sent …

After this trivial (?) issue is fixed, then I think there’s only one serious issue to fix:

Btw. If debug log for this restore failing when everything is “good” I can also send debug log(s) obviously…

Log began days later, so was less interesting. Log of compact and backup after it would be better.
To the extent that information was given , this looks like the usual problem of interrupted compact.

It’s possible that a simple patch will take care of it, but if things are truly “systemically” broken, then
sorting that out would be better (but would take a whole lot more time and expertise than I have…).

Since this open issue is over two years old, are there any new developer-volunteers who will help?

Duplicati only progresses thanks to its volunteers in code, test, docs, forum, and many other areas.
Right now, help is short in many areas. Any community members who can do any thing, please do.

Based on the latest information on the forum, and suitably small backup set I confirmed this again and now in steps, with restore test between each step.

  1. Compact fails - Restore still works
  2. Backup stops working - Restore still works
  3. Run repair - Restore still works
  4. Run compact again - Now backup is broken.

So it’s exactly like I said in the very first post. Systemic logical error which makes the backups unrestorable.

It’s fixed when it’s fixed. Acknowledged. And it’s exactly as bad trap as I said it would be.

But it’s still interesting why test claims it’s not broken.

Can you explain a bit more about the exact steps you do while testing a restore ? I have ranted here about the best way to use restore to test a backup:

in other words, testing a restore while having the live, original data available can be very misleading as Duplicati steals missing blocks from the live data unless specifically ordered otherwise.

1 Like

Exact steps for anything would help. If an easily reproduced case is found, that would be wonderful.
Steps 3 and 4 seem to say restore test worked, and compact ran, but restore test failed somehow.
Right now, there’s not even an error message given about final failure. Sadly, nothing to work with…