Is verification necessary when resuming a backup upload?

bcherb2 · November 13, 2018, 2:08pm

I’m trying to upload 5TB to Gdrive on a slow upload, and had to stop the backup for various reasons. I notice that whenever I resume the backup, it seems to be downloading the whole archive (and checking it against local data?). Is there a way to disable this? Then check it once at the end?

I don’t particularly want to test my ISP’s limits by downloading 15TB of Gdrive files in a month!

JonMikelV · November 13, 2018, 8:21pm

What makes you think it’s downloading things?

If it’s the progress bar counting ALL your files, that’s just the file scanner looking for changes since the last backup.

If it’s something other than the progress bar, then read on.

A normal backup process starts with a database <-> destination verification where it gets a list of files at the destination and makes sure they match what the database things should be there.

If that passes, the backup happens such that only changed file blocks compressed, encrypted, and uploaded to the destination. Note that if this is the (or continuation of the) FIRST back this process can take quite a while as the local database is being built up.

Once the backup step is complete, THEN a verification step kicks in which (by default) downloads ONE dblock file and checks it. The default dblock (Upload volume) size is 50MB, so you shouldn’t see any more than about that much download going on during validation.

Lastly, sparse / small file monitoring MAY determine that a compaction is needed in which case Duplicati will download MULTIPLE dblock files and re-compress them into fewer files (leaving out the block data that’s no longer needed). THIS step can potentially use a lot of download bandwidth - but it doesn’t happen until after a backup is completed.

bcherb2 · November 13, 2018, 8:25pm

I guess this is occurring because I don’t have one “complete” backup yet, I’m about 75% of the way there (on 15Mb/s internet).

It definitely is downloading (but not saving) data, I suppose it is downloading/comparing each and every file? Hopefully I don’t have to start over before this completes again, I’ve run through 2.75TB in incoming data today alone just verifying what on Google Drive.

JonMikelV · November 13, 2018, 8:36pm

Hmmm…assuming your source files are local and the destination is Google drive, I’m not sure what it would be downloading.

One of the reasons Duplicati keeps a local database of block hashes is so that it doesn’t have to download stuff from the destination to check for differences.

Is it possible you’ve included a Google Drive File Stream folder in your backup source? If that’s the case then yes - Duplicati probably is downloading your Google Drive files - but not for comparison, but so it can hash the, into blocks to be encrypted & compressed then uploaded back to your Google Drive!

If that’s what’s happening, hopefully your Duplicati Destination folder isn’t inside that Google File Stream path or things get REALLY messy.

bcherb2 · November 13, 2018, 8:59pm

I guess I was wrong. I was basing this off of my terminal network monitor

The ‘download rate’ is my internet speed, so I assumed it was pulling down all files it was checking. But upon further inspection, my WAN is showing negligible activity, so that must be the network activity between docker and the computer. So all local files are being compared to hashes, I suppose?

JonMikelV · November 13, 2018, 11:30pm

That would make sense of the Docker is committing via network protocols.

By default already backed up files are only re-hashed (chopped into 100kb blocks on which a hash algorithm is run) if their timestamp has changed.

Of course if you have a 4GB video file that you added a description to, then the whole file must be re-hashed to find that 1 different 100kb block.

Similarly a file or folder name change looks like all new content to Duplicati so it will have to re-hash all the content But in the case of only a name / path change should find all the hashes already in the database so will only upload the new file / path names, not there file content.