Any way to recover backup (repair and purge-broken-files don't help)?

Hi,

I have one backup (among many others) that started failing a ~month ago. I wasn’t able to get it ‘unstuck’ and so I’m asking here in case someone can suggest something else to try.

I am running Duplicati - 2.0.7.1_beta_2023-05-25

The backup is via SFTP/SSH to a host on the local network. That machine had issues with HDD (that backups were stored on) and that HDD had to be replaced – so it’s not particularly surprising that something has gotten broken. However the actual damage to files was very minimal, so I’m hopeful this can still be ‘repaired’.

Normally when I had issues with backups before, some combination of ‘repair’ and ‘purge-broken-files’ commands would typically get it ‘unstuck’ and working again. Doesn’t seem to work in this case though.

So the details:

Attempting to run backup

log file:

2023-11-20 13:20:58 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-11-20 13:28:31 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-11-20 13:28:37 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (931 bytes)
2023-11-20 13:28:37 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingFile]: Missing file: duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes
2023-11-20 13:28:37 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteFiles]: Found 1 files that are missing from the remote storage, please run repair
2023-11-20 13:28:37 -04 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
Duplicati.Library.Interface.RemoteListVerificationException: Found 1 files that are missing from the remote storage, please run repair
   at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, IEnumerable`1 protectedFiles)
   at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify(BackendManager backend, String protectedfile)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()

duplicati command output (in the UI):

Running commandline entry
Finished!

            
Backup started at 11/20/2023 1:20:58 PM
Checking remote backup ...
  Listing remote folder ...
Missing file: duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes
Found 1 files that are missing from the remote storage, please run repair
Fatal error => Found 1 files that are missing from the remote storage, please run repair


ErrorID: MissingRemoteFiles
Found 1 files that are missing from the remote storage, please run repair
Return code: 100

Attempting repair

log file:

2023-11-20 14:20:04 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2023-11-20 14:20:18 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-11-20 14:20:19 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (931 bytes)
2023-11-20 14:20:29 -04 - [Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file: duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes, message: Internal consistency check failed, generated index block has wrong hash, uiR9PEaZbVL8UyjRvZGKY52HSC8fLkpeV3+RVJBbkmc= vs 8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=
System.Exception: Internal consistency check failed, generated index block has wrong hash, uiR9PEaZbVL8UyjRvZGKY52HSC8fLkpeV3+RVJBbkmc= vs 8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=
   at Duplicati.Library.Main.Operation.RepairHandler.RunRepairRemote()

UI command output

Running commandline entry
Finished!

            
  Listing remote folder ...
Failed to perform cleanup for missing file: duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes, message: Internal consistency check failed, generated index block has wrong hash, uiR9PEaZbVL8UyjRvZGKY52HSC8fLkpeV3+RVJBbkmc= vs 8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk= => Internal consistency check failed, generated index block has wrong hash, uiR9PEaZbVL8UyjRvZGKY52HSC8fLkpeV3+RVJBbkmc= vs 8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=
Return code: 0

Attempting purge-broken-files

log file:

2023-11-20 14:22:16 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation PurgeBrokenFiles has started
2023-11-20 14:23:58 -04 - [Information-Duplicati.Library.Main.Operation.ListBrokenFilesHandler-NoBrokenFilesetsInDatabase]: No broken filesets found in database, checking for missing remote files
2023-11-20 14:23:58 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-11-20 14:23:58 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (931 bytes)
2023-11-20 14:23:58 -04 - [Information-Duplicati.Library.Main.Operation.ListBrokenFilesHandler-MarkedRemoteFilesForDeletion]: Marked 1 remote files for deletion
2023-11-20 14:25:11 -04 - [Information-Duplicati.Library.Main.Operation.PurgeBrokenFilesHandler-NoBrokenSetsButMissingRemoteFiles]: Found no broken filesets, but 1 missing remote files. Purging from database.

UI output

Running commandline entry
Finished!

            
  Listing remote folder ...
Return code: 0

Trying to repeat repair and purge-broken-files doesn’t produce any new results, it’s still broken in the same way.

Is there something else I can try to get this backup working again?

Thank you!

Hello

from the little I have looked at your problem, I’d say that the database seems to be not correct.
Since this idea in this situation is to repair the backend based on the database, it is probably the reason it fails.
I have currently no idea why or how to fix it.
However the idea of renaming the database, recreating it and trying to repair the backend then seems a bit risky to do at first (could be a last recourse if there is no hope to fix it otherwise)

Maybe you could try to generate a bug report and post the link here.

Creating a bug report will give some specific information that might help guide a recovery attempt.

Well, yes, but trying things sometimes involves trying to delete damage, but still might not succeed.
Rather than fly completely blind, some specific data might help plan some next moves to attempt…

Keeping copies of important things such as the database can help reset for a fresh try, but can you actually copy the entire destination? It’s possibly large, but having a copy would ease some worries.

Does this backup have important historical information (that’s often why one has backups…), or is it simply not wanting to endure the work of creating fresh? Sometimes total uploads can be avoided…

Hi gpatel-fr, ts678,

thanks for your replies!

First of all, apologies for not including this information with the original post – this particular backup isn’t huge (~100GB) and I can copy it somewhere if needed; it is also not particularly important to me (I have a copy on another destination among other things) and is ‘okay’ to lose.

It’s more a matter of a general principle – whether I can trust Duplicati with the data that actually matters to me (not that I know of any alternatives to use instead).

I am willing to try the full database rebuild if that’s what’s required – although I’ve seem to have seen posts on the forum that trying to do that takes ‘forever’ (i.e. much longer than the initial backup)?

This backup contains many files – at least several hundreds of thousands.

I did the ‘bug report’ as you’ve described (thanks! I didn’t even know about this option) – this resulted in 1.5GB archive with the actual sqlite DB inside over 3GB in size.

I can share that except for the fact that it appears to contain a ton of sensitive information – e.g. file names in the plain text – so I’m definitely not going to post it in the public post. I can private-message the link (assuming this forum has private messages) if looking at this huge dump is actually helpful.

Thank you!

When you damage the backup files, typically all bets are off as to whether the backup is recoverable. There is no redundant storage of the data. Best practice is to have multiple backups, done differently.
Software can also have bugs. This is a different issue than hurting the files that contain the backup…

Duplicati.CommandLine.RecoveryTool.exe

This tool can be used in very specific situations, where you have to restore data from a corrupted backup.

would have been an option if we couldn’t get back to normal use, and history was critical to maintain. Basically keep the damaged backup for special recovery of old stuff, while starting fresh to keep new.
Hopefully we won’t need to do that here, but we don’t yet know what success we will be able to have.

I doubt it. If you see any source file names, what table are they in? Destination names aren’t sensitive.

EDIT:

The File view in your original database is full of source paths. In bug report, they wind up looking like:

image

this happens only when the backend is very damaged. An enhancement is available in current Duplicati code to get this time to be more manageable. If you go this way, I can create a temporary build including this change that you can install over current Duplicati.

Source file names are very much in there unfortunately.

An example (I’ve replaced stuff I didn’t want publicly shared with Xs):
x:\XXXXX\file-copies\XXXX\XXX\AMD\Chipset_Software\Binaries\GPIO2 Driver\WTx64\

I just opened the sqlite file with a [text] viewer. Maybe database view doesn’t show this stuff, but it’s very much in there (could be transaction log or whatever, I don’t know sqlite format).

Yes, this makes sense, however I am not sure these files were actually damaged. Far more likely Duplicati wasn’t able to finish some kind of write/read operation correctly (this is how I realized something is wrong with the drive – it started throwing errors; I then did a full copy and there weren’t many files that were actually damaged, I’m not 100% confident whether there were any related to this specific backup).

Thanks, appreciate the offer and may need to take you up on it.

However it has always been my experience that any updates (to anything, not specifically Duplicati) are far more likely to break something that I’m using rather than to actually help me. And given this install manages backups that I care much more about that this broken one I will have to think whether risking an update is worth it.

code just copy the original database, create a new obfuscated table from the filelookup and file tables, and then drop these 2 originals table without vacuuming the database so these tables are still here in a deleted state.
Nobody thought of that for all this time :confused:
Thanks for noticing it.

Glad to be of help!

So what could be the next steps in trying to fix my backup?

I have VACUUM-ed the bugreport database and it looks like it doesn’t contain source file names anymore (or at least they’re not obvious). I’m still not comfortable sharing it publicly (I’m not at all sure there isn’t another data leak somewhere) but I can private-message a link to you or @ts678 if you want to take a look?

Or should I just skip to step where I try to re-create the database? If going to re-create, are you saying that you can make a build that will be much faster at that than the Duplicati - 2.0.7.1_beta_2023-05-25 version I’m running now?

If you are working for a secret service I don’t want to look at your data even by private message :slight_smile:

that’s a possibility.

What I said is that current version can be very slow in some case of very damaged backend. If it’s not, the experimental version will not be faster.
I have triggered a build of current master, results can be found here:

You need a github login to retrieve them.

Not at all :slight_smile: I just generally prefer not to publicly post my private information if possible. And in this case we already confirmed the significant (in my opinion) private data leak, so I feel extra caution is warranted.

Please let me know if you or @ts678 want to take a look and I’ll set up a share and drop you a link privately.

For the database-rebuilding attempt – if I just save the current XXX.sqlite file for the backup, I should be able to revert back to it if rebuild doesn’t work? Or do I need to back up the entire backup as well (e.g. if database rebuild also modifies the backup files)?

Also – thanks for clarifying the performance issue with the rebuild – I do not think the backup files are significantly damaged, so I think I can try with my current Duplicati version.

I don’t think an attempt to rebuild a database can change the destination, but some other repairs could.
It’s got two distinct functions. The one that tries to make database match destination can change either.

You did report a missing dindex file, which means at least one dblock will have to be read. That’s quick, however under severe damage there is a slow exhaustive search from 90% to 100% on progress bar…

This is what is being improved, meaning that if there’s only minor damage then it’s near-normal timings.

go.

the idea is simple in principle even if highly unclear if one is not aware of it: if the database does not exist, it is rebuilt and the backend is handled read only; if the database exists and is likely to be good based on a check of its consistency, it is taken as the source of truth and an attempt is done to make the backend conform to it.

Generally agree, however there are a couple of places where the local database can have some fixes:

Speaking of blocklist hashes, they might be involved in

and after that I got lost in SQL, but it made me wonder if some other internal records were misaligned.
A formatted view of the GetBlocklists query being run was more readable, but still stretching my skills:

SELECT "A"."Hash"
    ,"C"."Hash"
FROM (
    SELECT "BlocklistHash"."BlocksetID"
        ,"Block"."Hash"
        ,*
    FROM "BlocklistHash"
        ,"Block"
    WHERE "BlocklistHash"."Hash" = "Block"."Hash"
        AND "Block"."VolumeID" = 2
    ) A
    ,"BlocksetEntry" B
    ,"Block" C
WHERE "B"."BlocksetID" = "A"."BlocksetID"
    AND "B"."Index" >= ("A"."Index" * 3200)
    AND "B"."Index" < (("A"."Index" + 1) * 3200)
    AND "C"."ID" = "B"."BlockID"
ORDER BY "A"."BlocksetID"
    ,"B"."Index"

If you get into this, please check my doing. I just grabbed a likely-looking query from my profiling log…

Sent you a link via private message.

Regarding the backup itself:

  • I’ll wait a bit before doing anything in case you have some insight based on the bugreport data.
  • Then I’m thinking of doing the following:
    – Backing up somewhere the current database for this backup (just the single .sqlite file)
    – Then I assume I can just hit this button in the UI?
    image
    – Is this a good way to go about it?

Separately but relatedly – how difficult is it to build/compile Duplicati for someone with zero .net/C# experience? I do Java professionally, but a few years ago I remember trying to figure how to build Duplicati (I was considering doing my own fixes) and getting very much stumped (although I don’t remember the details now).

Personally, I’ve tried a few times with various Windows and Visual Studio Community Edition.
Experience was smoother with newer. I think you need to know to get .NET Framework 4.7.1.
Building all packages might have stumped me, but I usually install directly from the .zip files.

I can offer a Python script that will try to predict if you’ll have a smooth repair before you try it.

You’d need to gather your dlist and dindex files into a folder, and decrypt them. I made this
more as a tester of continued health of backups without the trouble of actually doing recreate.

I don’t know if your missing dindex would set it to declaring missing data. That requires a dlist
referencing a block not listed in any dindex, but possibly your backup doesn’t have such dlists.

It’s fine if you don’t want insight that badly. I just thought I’d mention it as another analysis tool.

It was pretty much download Visual Studio (Community) with C# and the right framework version. The main trouble I had when I started was that the NuGet packages which should install automatically were broken.

I wrote this Wiki section with some information if you also run into that.

Once that is working, you can just build solution and press the Run button to start it. If you run it alongside a main duplicati install, that will also work and not interfere. The config and databases are in the build directory and it will use port 8300 for the web interface.

I have taken a look at this, it’s a complicated problem. 2 things I noticed:

  • quite a lot of deleted blocks, so there is a lot of purging in this system
  • the damaged block concerns a ‘block of block’, that is, a block containing hashes of other blocks - this happens for big files, and it concerns a particular file whose exact name I can’t tell you (obviously) but it is like this:

sqlite> select path, length from fixedfile join blockset on (blockset.id = fixedfile.blocksetid) where blocksetid in (1857582,1959476,2374472,2387775,2396715,2400383,2415547,2423463,2430949,2436523,2442908,2450029,2456499);
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408
X:\106\506440.bin|1920401408

and the dates vary from 2023-08-20 04:51:35 to 2023-10-26 04:54:35.
I don’t know if you can identify the real file from the size, the dates and the apparent location.
If yes, maybe deleting it from the backup could fix it.

Thanks, very much appreciate this!

I’ve opened the original DB and ran a query against it (had to figure out what ‘fixedfile’ stands for, it looks like it’s FileLookup).

select path, length from FileLookup join blockset on (blockset.id = FileLookup.blocksetid) where blocksetid in (1857582,1959476,2374472,2387775,2396715,2400383,2415547,2423463,2430949,2436523,2442908,2450029,2456499)
and length=1920401408

That gave me the file name (although not the path) and I was able to figure out what it is – it’s an Android image storage file for emulation (jfyi).

This is definitely not the file I care about in terms of history or anything like that.

So what do you mean in terms of ‘deleting it from backup’? How would I go about that?

Also, unless this is complicated, could you share how did you figure which exact file was the problem? I know the SQL decently well and I might be able to figure out such problems myself in the future if I have an idea of where/how to look.

Thanks, and I totally understand if that’s too complicated to explain / you don’t want to / etc. – don’t worry about this in that case.