Best restore strategy - recording metadata

pfroehlich · June 29, 2025, 7:26am

Hi,
I am restoring 500GBytes to a fresh Windows 11 PC after the original PC was unfortunately stolen. Running Duplicati - 2.1.0.5_stable_2025-03-04.

At 4:08 a.m. the restore stalls (verbose logs stop at that time). As it happened two nights in a row at the identical time, surely it is cause by 24h forced internet disruption by my ISP. It’ll break TCP connections, apparantly no recovery possible.

Question: would --no-connection-reuse = true help to recover?

My initial restore was “from configuration”, but without prior recreation of the backup set. I restore to C:\ instead of Z:. The restore stalled with 160GB left to restore. All time stamps of the restored file were the current (restore) time.

I now recreated the backup set (never run), and restored from there. It found existing files, and claimed to restore the remaining ~2200 files / 169 GB. Verbose log recorded only entries like " Recording metadata from remote data: C:\Daten...". During the night the restore stalled again. I did find some files with older time stamps. After ~20h it had not started restoring any files, yet. I tried restoring only a subfolder (9 GB with 1GB yet left to restore), but this has been running for a couple of hours already, still in the process of recording metadata.

EDIT: the restore of the subfolder just finished after roughly 2 hours with all time stamps succesfully restored.

Question: I do need to restore all the information quickly, irrespective of metadata. Can I run a restore with --skip-metadata = true set and still later recover the time stamps / metadata by re-running the restore with --skip-metadata = false?

EDIT: it appears that --skip-metadata = false has no influence on the restore strategy. As the metadata is already stored in the remote data set, apparently it always gets restored. It would be nice if a restore strategy could prioritize getting missing files first before fixing metadata for already existing files!

Thanks a million for any help! I need to get my dad up an runing again, maybe other “family admins” can relate

Cheers
Peter

Duplicati is fabulous, thank you all for your work!

kenkendk · July 3, 2025, 9:24am

Hi @pfroehlich, welcome to the forum

It should not. Duplicati will reuse a “backend instance” for multiple operations for performance reasons. This is important for things like Google Drive where it needs to fetch the target folder ID which can be a bit slow. If there is any failure on a connection, Duplicati will discard that backend instance and create a new, to make sure it does not end up in a loop where a cached value is wrong.

The option --no-connection-reuse simply toggles always discarding a backend instance, no matter if there are errors or not.

In your case, it seems like there is a timeout, but it does not trigger, and hangs instead.
For the canary builds, we have rebuilt timeout detection in a number of ways, so I think this will be more resilient against this type of outage.

The canary builds have a new restore algorithm that saves quite some time.

I have not tested that, but it should work. When you restore the second time, make sure you choose “overwrite” so it does not create multiple copies.

Duplicati will then check that the file already exists, and not restore it again. It should detect that the metadata is incorrect and apply the metadata only. You can perhaps try with a small sample to verify, as this a bit of an edge case that we have not built specifically for.

Thanks for the praise!

pfroehlich · July 3, 2025, 12:22pm

Hi @kenkendk, thank you for your kind and thorough answer.

I have successfully restored the ~450 GByte (from OneDrive) and recovered all my data.

I’ll summarize my learnings:

I first restored from the configuration, but without creating the backup set first, I believe it would have been better to recreate the backup set as first step. This restore process got suspend about half way through when my ISP did the forced dis- and reconnect. Most (or all?) files had the current data as file date.
My second run was again on the full set but with the backup set created first and then restore started from the backup set. The majority if not all of the 20 hours until the next disconnect when the process stalled again was spent “recording metadata”. Then I had both, files with original but also with current date.
After that I chose smaller portions of data, selected by subfolder(s), and each completed flawlessly. Also there, considerable time was spent on “recording metadata”.
When I had finished restoring all parts, I went back to restoring the full data set, just to make sure I had not missed anything. This, however, I had to cancel, because it would again “record metadata” seamingly for every file in the set. I had expected that it would quickly finish as all metadata had been recorded in a previous run, but apparently all files are scanned again.
I changed the drive letter in the backup set, as that had changed, and ran a subsequent backup. That took about 2 hours and went smoothly, although it claimed to change the creation date from something like 1.1.1901 to the true file creation data.

Great, that dev releases promise a faster restore. To me speed is not really critical, but recovery after losing connectivity would have made a major difference in this case. I don’t understand the metadata recording thing, my hope would have been that on subsequent runs it would not parse the files again.

Note: I always chose “overwrite”. It tried with --skip-metadata = false, but that would have no effect on the restore process, it would still “record metatada” for each file.

I am happy that my dad’s files were all recovered. I had set him up with automated and unattended Duplicati backups and achieved 100% recovery when it was needed.

Thanks!
Peter