Does Restore Only Fetch Changed Files?

bernarddt · December 7, 2021, 10:10am

Why I’m asking:

I’m mirroring one server to another server. I can copy the source server’s files over FTP quickly since they are on the same local network at the moment. But after this I would rather just restore Duplicati’s backup to get updated/changed files.

My Question:

If you do a restore to a location that already has files and you choose “override target files” does Duplicati check the local files to see if they are the same as the remote files before downloading from the backup storage?

In other words, my source is a large amount of data say 100Gb, I back that up daily. At some point I want to “restore” that data to the target server. But I’m hoping that Duplicati would inspect the local files and compare that to the hashes in the backup location file lists, and then determine if it should download the file from the backup storage. In effect rather than downloading all 100Gb every time I restore, it would only download the “dblock” files with any mismatched versions to restore those.

Hope I’m making sense.

drwtsn32 · December 7, 2021, 3:45pm

I just did a quick test and yes, it looks like Duplicati will still download dblocks even when it doesn’t have to.

If the existing file’s metadata is the same as the one you are trying to restore, Duplicati won’t overwrite the local file (even if you have the overwrite option set). But it still downloads the dblock(s). Seems odd to me. The check could be done earlier so dblock(s) don’t have to be retrieved, I would think.

Edit to add: Maybe Duplicati doesn’t actually know the full metadata until it downloads the dblock containing the metadata info. That could be what’s going on. Will need to look more closely…

bernarddt · December 7, 2021, 4:07pm

Thank you for the prompt check. I also thought I should probably test and answer my own question! But I’m building a bunch of new servers at the moment so my time is limited.

It would be very beneficial if it doesn’t need to download all the dblock files, this would save a ton on bandwidth if you are restoring to a “partial” complete fileset. In other words, you back up everything in the past but now has some missing/corrupt files. But without restoring each one separately follow a manual pick process, you can just tel duplicate to restore anything that doesn’t match the original backup set and this would then do the legwork.

I’m also thinking that using this as a simple replication process, by constantly restoring a current backup set on a schedule, you can have an up to date copy of your data on a remote location. Then when you suddenly need to fall over to the remote location, you can just do a simple last-minute restore that would be quick since most data is already on the remote copy.

ts678 · December 8, 2021, 12:02am

Note that unless you have a current local database (which is true on original source system, and can be true on second system if you sync the database somehow), you can only do Direct restore from backup files which can take awhile to build a partial temporary database. Ideally that’s a small download though.

If you constantly have two systems with the same destination, be absolutely certain to backup only from
one until it fails in disaster, otherwise you’ll put a DB out of sync with its destination and have to recreate.

bernarddt · December 8, 2021, 6:36am

Thanks for the extra info. I’ve not tried to do a “direct restore from backup files” with a large backup. So I also think that time would be a consideration. Also there may be more more fit for purpose solutions for this.

In my case, I had in mind that I can build a local bare-metal server that can constantly download newer backups, but since it is not a real DR/Failover solutions, this is merely a way to get the files faster. Speed would not matter much as it could take hours, days or weeks. This would just determine how “often” it would pull the data and go to work.

Also, this is a solution that doesn’t rely on the production server to be involved in the process (like if we would do a direct file sync action to this local server, that is more DR/failover like).

To expand on this, I’m looking into using Backblaze B2 for backup target, but since they charge a fee for downloads and the speed is slow (compared to local transfers). I can see myself sitting in a position where my only copy of the data is safe but slow to retrieve, and this would lead to a long restoration process. If I did have a constantly updating local system, I could technically ship the HDD to the data centre and get the HDD connected to the local network to have a faster transfer rate of the bulk of the files. But this is only useful if I can use Duplicati to be my “assurance” that when I do this, I can use it to restore any files that are outdated or missing or different.

ts678 · December 8, 2021, 1:11pm

Overview says:

Duplicati is not:

A file synchronization program.

but you do make a good point that if one can sort of get that, it avoids extra doings on production server.

It will probably get confused if it takes long enough to have the primary change the backup underneath it. Ideally you should schedule these so that they are not running at the same time, so destination is stable.

Which HDD? I don’t know what’s where, and don’t understand why the HDD isn’t newer than B2 backup. There’s at least one report in the forum of someone who did half of your plan, keeping a local copy of the backup at the standby, which at least reduces all of download time and some of the backup restore time.

If you don’t do anything fancier, you should at least test that in a DR scenario where source isn’t present.