How to recover a backup set after hw failure?

list-broken-files has been running more than four hours now. I am now going to kill the process and will not try running it again. I may try purge-broken-files, with or without --dry-run, but I will not do it today.

I’m doing a test list-broken-files run on a small backup (3G source, 2G dest, 6M sqlite file) and will do the same on a much larger backup later and let you know how long it took my runs.

Thank you, I appreciate that.

I did run pruge-broken-files with --dry-run, but also that one took more than four hours and I had to kill it because I could not leave it running through the night (for unrelated reasons).

I hope that in future, Duplicati2 would have heuristics in place to detect situations like this, so that if a backup produces the same warnings for, say, 10 times in a row, it could then propose an operation to resolve the situation. I understand that the range of possible problems is huge, and although my case seems simple, the heuristic would have to deal with a lot of stuff. Anyway, if a backup has failed and produced this kind of set of empty files on a local disk, and no other problems exist, it should be fairly simple to resolve the problem.

Looking via file system, the backup set I have this trouble with has 401,0 GB source and 417,9 GB target. The SQL file is 121,7 MB. The Duplicati2 front page, however, says:

Source:    2,94 GT
Backup:    10,41 GT / 170 Versions

Ouch. That GT has to mean GB; I believe the “T” here stands for “tavu”, which is “byte” in Finnish. The biggest disk I have is 1T.

Sorry this took so long - real life keeps interrupting my computer time!


So I ran a list-broken-files command on my small local disk job (3G source, 2G dest, 6M sqlite file) and in less than a minute all it returned was an expected (because I don’t have any broken files) message of:

Listing remote folder ...
Return code: 0

Wanting to know a little bit more I ran it again with the following “Commandling arguments”:

--console-log-level=profiling
--console-log-filter=-*.Database.*

And got a little bit more detail:

The operation ListBrokenFiles has started
Starting - Running ListBrokenFiles
No broken filesets found in database, checking for missing remote files
Starting - RemoteOperationList
Backend event: List - Started:  ()
  Listing remote folder ...
Backend event: List - Completed:  (190 bytes)
RemoteOperationList took 0:00:00:00.032
Skipping operation because no files were found to be missing, and no filesets were recorded as broken.
Running ListBrokenFiles took 0:00:00:21.129
Whole SMTP communication: Connected
...
C: QUIT
S: 221 2.0.0 Service closing transmission channel

Email sent successfully using server: smtp://0.0.0.0:0
Return code: 0

I ran it again on my larger local SSH WiFi job (98G source, 92G dest, 101G sqlite file) and after adding the console parms above and removing filters (because it complained about having them) I got these results in under a minute:

The operation ListBrokenFiles has started
Starting - Running ListBrokenFiles
No broken filesets found in database, checking for missing remote files
Starting - RemoteOperationList
Backend event: List - Started:  ()
  Listing remote folder ...
Backend event: List - Completed:  (6.53 KB)
RemoteOperationList took 0:00:00:03.674
Skipping operation because no files were found to be missing, and no filesets were recorded as broken.
Running ListBrokenFiles took 0:00:00:25.525
Whole SMTP communication: Connected
...
S: 221 2.0.0 Service closing transmission channel

Email sent successfully using server: smtp://0.0.0.0:0
Return code: 0

I’ll have another go at it in a bit after I’ve forced some missing (or 0 length) files.

Is it possible you’ve got some symlinks / mount points that might be causing a loop?

But why would such a loop only affect the list-broken-files and nothing else?

The source (/data/$USER/) only has three symlinks, each of them for a lock file in the profile folder of a Mozilla product. The target (/media/$USER/$DISK/Duplicati/data/) has no symlinks at all.

It shouldn’t, but that’s not what meant. Maybe I misunderstood your earlier post - I thought you were commenting on it being odd that Duplicati reported a source of 2,94 GT (TB) when your biggest disk is 1T.

Of course I completely forgot that you could have more than one disk included in a single backup. :blush:

Well, the source is a folder in a logical LVM volume, which sits on a single disk, and the target is a folder in a partition of a single disk. No other disks are involved.

I finally did some list-broken-file test runs on my smaller local drive backup with forced bad files and in all cases it finished in under 10 seconds, so I’m not sure why yours would be taking so long.

I’ll try again on my larger backup to see if the number source files is potentially the issue (perhaps one a bad/missing archive file is found it’s taking a long time to figure which of your source file backups are affected).

Note that in my case the 0 sized file error was “Listed as Verified with size 0” while in your case it was “Uploaded as Verified with size 0”. Plus I only deleted a dblock, not any related dindex or dlist files, so my test may be further from your scenario than is useful…


Running the list-broken-files command (with --console-log-level=profiling and --console-log-filter=-*.Database.*) on my small local disk backup with a forced-missing dblock file took about 10 seconds and returned the block below.

(Note that adding --full-result would have listed all the affected files instead of just the first 5.)

The operation ListBrokenFiles has started
Starting - Running ListBrokenFiles
No broken filesets found in database, checking for missing remote files
Starting - RemoteOperationList
Backend event: List - Started:  ()
  Listing remote folder ...
Backend event: List - Completed:  (190 bytes)
RemoteOperationList took 0:00:00:00.033
Marked 1 remote files for deletion
0	: 06/01/2018 10:49:02	(8 match(es))
	/source/[path to file 1] (1.15 MB)
	/source/[path to file 2] (1.26 MB)
	/source/[path to file 3] (2.60 MB)
	/source/Data/[path to file 4] (0 bytes)
	/source/Data/[path to file 5] (7.76 MB)
	 ... and 3 more, (use --full-result to list all)
Running ListBrokenFiles took 0:00:00:01.739
Return code: 0

Running it with a forced 0 size dblock file again took about 10 seconds and returned:

The operation ListBrokenFiles has started
Starting - Running ListBrokenFiles
No broken filesets found in database, checking for missing remote files
Starting - RemoteOperationList
Backend event: List - Started:  ()
  Listing remote folder ...
Backend event: List - Completed:  (191 bytes)
RemoteOperationList took 0:00:00:00.034
remote file duplicati-b6eaccfa0e6604202b512b11c12e3f220.dblock.zip.aes is listed as Verified with size 0 but should be 5598781, please verify the sha256 hash "Ms4HjtmsO01jk07sTBE2QB9J2+7NPkuCAOnMZHmUD9U="
Skipping operation because no files were found to be missing, and no filesets were recorded as broken.
Running ListBrokenFiles took 0:00:00:00.458
Return code: 0

@kenkendk or @Pectojin, when a 0 (or otherwise incorrectly) sized file is detected on the backup, is there anything that can be done other than the following?

  • delete the wrongly sized files
  • run list-broken-files to see what source file backups will be affected
  • run purge-broken-files to remove references to these bad files

I mentioned earlier that occasionally I get an error. I now realize that this happens when Duplicati happens to try and check the zero size files.

I’m not sure if I understood your comment on --full-result right, but I tried adding --full-result to the backup set settings, and I still see only five of the warnings in the Result listing. However, when I open the first Message line right after the Result, I see this message that I don’t think I have seen before:

Failed to process file duplicati-i672488a1535041cdae235a1b50b74945.dindex.zip.aes
System.Exception: Tiedosto duplicati-i672488a1535041cdae235a1b50b74945.dindex.zip.aes ladattiin palvelimelta, mutta koko oli 0. Odotettu koko oli 36621.
  at Duplicati.Library.Main.BackendManager.GetForTesting (System.String remotename, Int64 size, System.String hash) <0x406fa4e0 + 0x00107> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.TestHandler.DoRun (Int64 samples, Duplicati.Library.Main.Database.LocalTestDatabase db, Duplicati.Library.Main.BackendManager backend) <0x406f58d0 + 0x012ab> in <filename unknown>:0

I believe it is the only way.

I think technically it would be possible to reconstruct the file from the hashes and files on disk – assuming the files still exists on disk. However, I’m not aware of any such functionality

It would be cool if it could attempt to repair it from the disk, though. We have the file hashes in the dlist files, so we could be certain that the file on disk had not changed before attempting to fix the remote file.

The way I understood the above is that I should not remove the empty files, but instead try the pruge-broken-files operation.

This, however, seems to indicate that I should do the both. Is that right?

Technically yes - running purge-broken-files should be all that’s needed. However, I suspect that process sees a zero (or otherwise wrong) length dblock file and notes it as having the wrong size but does NOT trigger the fileset (set of 1 dindex + 1 dlist + 1 dblock) to be considered broken. (This is what happened in my 2nd example above with the message "Skipping operation because no files were found to be missing, and no filesets were recorded as broken.")

Whether that’s by design (better safe than sorry) or just a bug I’m not sure (hopefully @kenkendk can comment on that). If it’s a bug then perhaps we can do an update to consider incorrectly sized files as having broken a fileset as much as missing ones currently do.

So to get around the issue, I figure removing the wrongly sized files (either by moving them somewhere else or flat out deletion) will let them fall into the “missing” category thus letting list-broken-files and purge-broken-files do their job.

And just to confirm, their job is to remove certain versions of files from your backup. If you only have a single version of a file backed up then you will be removing that version from the backup. (If the file still exists on your source then it should get re-backed up with your next run.)


Absolutely! Though realistically the chances of of repairing the missing / incorrectly sized archive are probably pretty low due to needing ALL the blocks in the file to be available.

I would expect this process to more likely be something that uses the dlist hashes to identify files that haven’t changed since the broken version then create new archives containing those re-uploaded blocks. This would allow the unchanged files to continue to be listed in the initial backup version rather than in the next backup to be run.

Or even better, put new blocks in for ALL the missing blocks - where hashes haven’t changed everything is fine, where they HAVE (or are missing due to file deletions) make them all zeros then flag the files as being corrupted. We could even include a corruption percentage.

So rather than flat out removing files (that might be only partially corrupted in a way that the user might find acceptable) from the backup we instead make them available for the user to decide whether or not they want to try a recovery.

Of course that process isn’t as clear-cut as it sounds because future versions of files might be dependent on the missing blocks so if a single initial-backup-and-not-changed-for-a-while block goes missing, every future version of the file (until that block was changed) would have to be flagged with a corruption percentage.

And all that is just me thinking off the top of my head - I’m sure there are plenty of other potential issues I’m not considering… so that’s likely why the current process is to just get rid of them. :frowning:

This is getting funny.

I moved the three files away and tried the list-broken-files command again, but it was still stuck. I decided to try and see if giving the passphrase on command line would change anything, and it did, to my surprise. I had been running the list-broken-files command without any options at all. Now that I added the --passphrase option, I got an error message about missing database file – the error refers to the file by wrong name. I had early on renamed the databases to match the backup set names, but apparently the command tried to look it up by the original name? Anyway, I added the --dbpath with the right filename, and now the command lists lots of files.

I will try purge-broken-files next.

This is the result:

No broken filesets found in database, checking for missing remote files
  Listing remote folder ...
ignoring remote file listed as Deleted: duplicati-20180219T050028Z.dlist.zip.aes
ignoring remote file listed as Deleted: duplicati-20180219T175110Z.dlist.zip.aes
Marked 3 remote files for deletion
Found 29 broken filesets with 485 affected files, purging files
Purging 87 file(s) from fileset 20.5.2018 6:29:06
Starting purge operation
Replacing fileset duplicati-20180520T032906Z.dlist.zip.aes with duplicati-20180520T032907Z.dlist.zip.aes which has with 87 fewer file(s) (64,19 MT reduction)

System.Exception: Unexpected number of remote volumes marked as deleted. Found 1 filesets, but 0 volumes
  at Duplicati.Library.Main.Database.LocalDeleteDatabase+<DropFilesetsFromTable>d__5.MoveNext () <0x41ce9e80 + 0x007f7> in <filename unknown>:0 
  at System.Linq.Buffer`1[TElement]..ctor (IEnumerable`1 source) <0x41ce98f0 + 0x00273> in <filename unknown>:0 
  at System.Linq.Enumerable.ToArray[TSource] (IEnumerable`1 source) <0x41ce9870 + 0x0002f> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.PurgeFilesHandler.DoRun (Duplicati.Library.Main.Database.LocalPurgeDatabase db, IFilter filter, System.Action`3 filtercommand, Single pgoffset, Single pgspan) <0x41cd4a30 + 0x015bb> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.PurgeFilesHandler.Run (Duplicati.Library.Main.Database.LocalPurgeDatabase db, Single pgoffset, Single pgspan, System.Action`3 filtercommand) <0x41cd49d0 + 0x0004f> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.PurgeBrokenFilesHandler.Run (IFilter filter) <0x41c949e0 + 0x01ba3> in <filename unknown>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass39_0.<PurgeBrokenFiles>b__0 (Duplicati.Library.Main.PurgeBrokenFilesResults result) <0x41c947a0 + 0x0006f> in <filename unknown>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (Duplicati.Library.Main.T result, System.String[]& paths, IFilter& filter, System.Action`1 method) <0x41c2efc0 + 0x0022f> in <filename unknown>:0 

What now?

You’re not the only one having these issues. I’ve posted this, after getting mad about how that worked out:

Anyway, even after that. It seems that the i-files are surprisingly often corrupted. But it’s funny that the b-files aren’t. So there’s something fishy out there, for sure.

I have never taken this as anything but beta. I don’t expect it to be ready yet. As I mentioned earlier, I do not depend on Duplicati2 alone for my backups; I am running it alongside Déjà Dup on Ubuntu, and on Windows machine, alongside the old Duplicati version.

I do think, though, that the old version should be fully supported until version 2 is ready.

I think it is hight time to go back to this comment I made:

When the problem occurred, my initial reaction was to try and run the backup again to see if I get error messages indicating a massive problem. But when I ran the backup again, I only got warnings, which gave me false impression. After several re-runs, I realized that I only get errors when the new run tries to verify those zero size files. So I really should not have tried running the backup at all after the initial problem appeared. And the you gave me lots of advice on how to add more options to the backup set to get more info to the log. So I re-ran and re-ran and re-ran the backup over and over and over again, when I really should not have, since it was already broken.

I strongly believe that when a problem occurs, the user must have access to the full list of error and warning messages. Even if the UI shows three dots, by default, the user must have a way to open up the full list hidden behind those dots, instead of having to re-run the backup with new options added to it. This is a very basic need. This is my biggest grief at this point.