What's going on in between "Backup Started at ..." and

OS: Windows 10 Pro
Duplicati v2.0.4.23_beta_2019-07-14

I have a rather large (1.2TB) backup set. I’m trying to update the backup but it’s spending a LOT of time at the “Backup Started at [time]” message. It takes at about 30 minutes before it moves on to the “Checking remote backup …”. Once it gets to “Checking remote backup …” the backup proceeds very quickly, even with the large number and size of the files.

I’ve tried adding several options (in bold below) to see if I could speed things up but it doesn’t help.

I’ve noticed an odd thing. Once the backup has completed and I try running it again, it’s very fast (a few minutes). It’s like something is getting cached in RAM. If I reboot the PC (Windows 10) and try again, it’s back to the 30+ minute wait.

For this backup “local” is a shared network drive (\Pippin\Media\Video\Television) and “remote” is a USB drive connected to the local computer (F:\Television\Backup\Files). I would suspect the network as being the culprit but I see the same performance issue when “local” is an internal SSD and “remote” is a USB drive on the same machine.

What’s going on between “Backup Started at…” and “Checking remote backup …” and is there anything I can do to speed it up or eliminate it?

First run after booting computer:

F:\Television>\sauron\System\UtilitySoftware\Duplicati\Duplicati.CommandLine.exe BACKUP Backup\Files \PIPPIN\Media\Video\Television --backup-name=Television --dbpath=“Backup\DB\Television.sqlite” –skip-file-hash-checks=true --no-backend-verification=true --check-filetime-only=true --encryption-module=aes --compression-module=zip --dblock-size=1GB --keep-time=2M --disable-module=console-password-input --no-encryption=true --exclude=“System Volume Information” --exclude="$RECYCLE.BIN" --exclude=“lrdata” --exclude="[!]" --exclude=“Backup\DB
Backup started at 1/12/2020 11:00:42 AM
Checking remote backup …
Scanning local files …
8182 files need to be examined (1.16 TB)
8127 files need to be examined (1.16 TB)
Duration of backup: 00:32:06
Remote files: 2496
Remote size: 1.22 TB
Total remote quota: 3.64 TB
Available remote quota: 1.67 TB
Files added: 0
Files deleted: 0
Files changed: 0
Data uploaded: 0 bytes
Data downloaded: 0 bytes
Backup completed successfully!

Second run after waiting 30+ minutes for the first run:
F:\Television>\sauron\System\UtilitySoftware\Duplicati\Duplicati.CommandLine.exe BACKUP Backup\Files \PIPPIN\Media\Video\Television --backup-name=Television --dbpath=“Backup\DB\Television.sqlite” --skip-file-hash-checks=true --no-backend-verification=true --check-filetime-only=true --encryption-module=aes --compression-module=zip --dblock-size=1GB --keep-time=2M --disable-module=console-password-input --no-encryption=true --exclude=“System Volume Information” --exclude="$RECYCLE.BIN" --exclude=“lrdata” --exclude="[!]" --exclude=“Backup\DB
Backup started at 1/12/2020 11:38:32 AM
Checking remote backup …
Scanning local files …
8182 files need to be examined (1.16 TB)
8127 files need to be examined (1.16 TB)
Duration of backup: 00:00:52
Remote files: 2496
Remote size: 1.22 TB
Total remote quota: 3.64 TB
Available remote quota: 1.67 TB
Files added: 0
Files deleted: 0
Files changed: 0
Data uploaded: 0 bytes
Data downloaded: 0 bytes
Backup completed successfully!

I’m surprised nobody has replied to this one yet. I’m trying something now based on a post I read about how many blocks are created for very large backups. Since my files are video files and are several hundred MB or even 1-2GB I changed my --blocksize to 200MB. It’s drastically reducing the sizes of the database file which I’m hoping is the bottleneck for speed. Based on my calculations I’ll go from 12 million hashes down to 6 thousand.

Given the size of these files and the fact that if they change it will probably affect the whole file anyway, this seems like a good plan.

I’ll update this thread if that solves the problem.

Yup. It appears that was the problem. The backup was also much quicker this time (5 hours instead of more than 24). It will be something to consider in future backup setups and it will complicate some backups since I may have to split the backup between “normal” files and “large” files (like photographs) to optimize performance.

BTW, the database went from over 2GB to about 8MB.

There are some database performance improvements in the upcoming Beta (expected to be very near if not identical to current Experimental) that help the SQL queries run more efficiently on larger databases.

Run PRAGMA optimize upon closing database connection #3745

In terms of DB size (as opposed to performance through wise SQL query plans), improvements exist.

You currently have a –retention-policy option to do progressive version thinning with age, if you want. Some DB sanity verification tests at start of backup are by version. I don’t know how many you have.

I don’t know if all will be enough to avoid splitting, but splitting can also be good for integrity/recovery.