Restore downloads dlist files twice?

Coises · September 10, 2021, 10:19pm

I’m in the process of testing my Duplicati backups to be sure they can be restored. I’m using rsync.net (SFTP) to store the backups.

Using “Direct restore from backup files” and specifying “–no-local-blocks=true” I noticed something that seems very odd while looking at the log data from the server. First, Duplicati downloads all the dlist files (195 of them in this case, which takes about 15 minutes) to build the list of files that can be restored. Then, once I make a selection (in this case, the entire last backup) and continue it begins downloading… dlist files. I don’t think it’s downloading all of them again, but certainly enough to take quite a bit of time.

Is this expected — re-downloading files it already downloaded once?

Ugh… and then my Internet connection dropped while these downloads were in progress. By the time I noticed it and could pause Duplicati (I had been writing this message), the count had already reached 5 of 5 retries and it marked the download of the file it was working on as failed. So I’ll need to start this test all over again… shouldn’t Duplicati know the difference between a missing file (especially one it already downloaded less than fifteen minutes before) and a dropped Internet connection?

ts678 · September 11, 2021, 12:09pm

Welcome to the forum @Coises

Ordinarily missing file checks are frequent and rely on database records. In the case you are testing, you have none because you have left regular database on the original system (e.g. testing disaster recovery).

Error messages in the ordinary case are quite explicit when destination files do not match database data.

It looks like that’s the case. The first download is to populate the restore selector with specific information about what backup versions were found, including whether they are full or partial (stopped early) backups.

The second set of downloads announces itself (I believe) as building a partial temporary database, which only gets used for that particular direct restore. For a full restore, the bulk of the download will be blocks in dblock files. It is best to make a “direct restore” worth the setup overhead (mostly in dlist and dindex files).

Other ways to reduce setup time include keeping versions trimmed with a suitable retention option, which can even “thin” out versions as they age. For larger backups (maybe above 100 GB), consider raising the blocksize to reduce the amount of block tracking that is done. Rough target limit might be 1 million blocks. Unfortunately it’s not possible to change blocksize on an existing backup. A fresh start would be required.

number-of-retries and retry-delay can help survive short drops in ordinary backups, but “direct restore” is more work to set up because it’s based on manual entry of information, and there’s not a nice option GUI. Many things that one might put on backup (e.g. blocksize) are gotten automatically, but retry config is not.

Regardless, safely saving an Export of the backup configuration will ease restoring config in DR scenario.

Coises · September 11, 2021, 4:57pm

Thank you so much for your help.

I think I might see my misunderstanding. I had thought that the only difference between a direct restore and restore from a backup configuration was that restoring from the backup configuration would not make a good test of backup integrity, because it would use information already on my computer.

Now I’m thinking that “direct restore” and restoring from a backup configuration are fundamentally different operations, even if I imported the backup configuration onto an otherwise empty machine.

If that’s true, then using direct restore as a test is giving me a skewed view of how long and fragile a recovery would be. For example, if an ordinary restore from a backup configuration fails halfway through due to an Internet outage, perhaps it’s possible to resume the restore later without losing all of the progress already made?.. whereas it looks like there’s nothing that can be done with an interrupted direct restore besides start over.

If I export a backup configuration, then import it under a new name, change the target directory to a new, empty directory, and turn off automatic scheduling, will restoring from that be a fair test? Or would Duplicati still be able to pick up some information from the existing installation? – in which case perhaps the only real way to test is to run the test on an entirely different machine.

ts678 · September 11, 2021, 8:58pm

There are actually at least two other ways to restore. In day-to-day life one would restore from a configured backup where there would already be a database with information, so it’s not a good disaster recovery test.

Additionally, Duplicati will try to obtain blocks from source files because it’s fast. no-local-blocks stops that.

is similar to direct restore, except you don’t need to type specifics in manually, however it has some bugs.

I’m not certain which you mean by “restore from a backup configuration”, but I think you mean an installed configuration rather than an exported configuration file (as in the image above). A restore from an ongoing backup will also examine the target files for blocks that are already set, and not bother setting those again.

I suspect an interrupted direct restore will do the same continuation of partial files, but needs upfront work preparing the partial temporary database so it can look at the target location and see how its files are now.

It won’t work unless you have a database. Also, don’t point to the same destination. One backup per folder.

The best disaster recovery test is a different machine, but if you just want to try a full restore without using existing source blocks, you can set the no-local-blocks option, restore to an empty folder, and look at that.

Checking that the database recreate works smoothly is also a good occasional test. You can rename the database temporarily (which will disable the Recreate button, but the Repair button will do the recreation). Assuming things go well, either old database or the new will work going forward. New won’t have old logs.

Coises · September 12, 2021, 5:11pm

So even if I’m creating the new backup configuration only to test restoring (since there is no such thing as a “restore configuration”), and I’ll never use it to make a backup, it’s still unsafe to point it to a remote folder that’s being used now for backups? I need to clone the backup folder on the server and attach the new configuration to the copy to be safe? Seems odd, but I think I have space to do that.

I presume this would also apply if I were doing the test restore on a different machine — I can’t see why that would be any different.

I understand that I’ll need to recreate the database — what I was thinking was that restoring driven by a configuration with a database built from the remote (which I think would be the normal “disaster recovery” case) might be more efficient and robust than “direct restore” (including avoiding the multiple downloads that let me to start this thread, and being able to be restarted after a connection drop without starting from scratch). Unless it’s already known that it isn’t, I plan to test that soon.

I’m still not confident that I understand how all the pieces interact, though. I really like what Duplicati does — completely private end-to-end encryption, block-level deduplication, indefinite versions, point-in-time restore, a wide choice of remote storage protocols — but there is a little too much “magic” going on for my taste. Magic has a tendency to bite you in the ass when something unusual happens.

ts678 · September 13, 2021, 3:40pm

It’s safe if you can guarantee that it’s only for restore. Don’t ever let it modify files on the destination.
It’s sort of an accident-waiting-to-happen.

There is the “Restore from configuration” previously shown, with mentioned potential for option issue.
This (or Direct restore from backup files) is the safe way to test that disaster recovery works.
There is no cloning of destination needed, but restore in middle of active backup run will be confused.

There is no fully imported configuration restricted to restores. It makes little sense, as database must
remain in sync with the destination that it describes, so periodic testing would need Recreate anyway, otherwise it would be full of complaints about mismatch, and runing Repair might damage destination.

Main advantage of different machine is it’s less likely to have source files (so won’t need no-local-blocks
for best test without optimization). One disadvantage is it’s possible to have restore bump into a backup. Testing on the original machine would probably wait or error if restore during the backup was attempted.
Beyond that, both cases must not alter the destination from two machines, as things will get out of sync.

You can test it. I have not, but I expect the main advantage is that you don’t need to type in setup manually when you Restore from configuration, so might have your favorite retry config against network drops (provided it doesn’t choke on Advanced options – you can add more data points on the choke if you like).

Some guides from the manual:

Database management
Restoring files if your Duplicati installation is lost
Disaster Recovery
How the backup process works
How the restore process works

Older information from GitHub:

How does it work the basics
Developer documentation