Lowering backup time

starkita · February 12, 2025, 10:36am

Hello all,

I would need some help to understand how to lower my backup time.

Right now I have a large (for me) dataset that is about 1 TB of size, with 220k files. We can assume for simplicity’s sake that each file is of equal size (about 4.45 mb). Over time the mean size of the files will grow (older files are smaller).

This dataset is backed up through FTP and Dropbox and they both take about 25 minutes.
On the same machine, I have other backup jobs that I need to run in a tighter timeframe (i.e. every 15 mins for a database) but I cannot do so, as Duplicati is often tied up with the two big datasets backing up.

Most of the time is spent “verifying backend data”. And with most of it, I mean it.

I already set auto-compact-interval to 1 week, and it has helped in the past.

I have seen the “check-filetime-only”, but have not yet used it as I am not sure it would be beneficial.

Block size is 200 MB. My reasoning was that the dataset is mostly addittive and having bigger block sizes should not be an issue. I have 1gbit down/300mbit up, so networking should not be an issue.

Open to suggestions!

Thanks

kenkendk · February 14, 2025, 7:11pm

In that case, you can disable or reduce the amount of verification being done.

First step would be to change:

--backup-test-percentage=0

This will make it fall back to --backup-test-samples=1, meaning it will validate a single triplet after each backup. You can further limit this by setting:

--backup-test-samples=0

This will then only do a list of the remote files and compare it to the database. If the listing is also too slow, you can enable:

--no-backend-verification=true

Which will just blindly assume that everything is in order.

For the FTP server, if you can run Python on it, you can use the option:

--upload-verification-file=true

Which will place a small JSON file with the expected file hashes and sizes.
There is a Python script here that will check that all files are correct, and you can run this on the server so you do not interfere with the backup, but still have some kind of check that things are working.

This is not really an option for Dropbox, as you cannot run anything on the Dropbox server.

In our tests, reducing compression level also has a significant impact on speed, but this does not seem that the compression of new data is the bottleneck for you.

ts678 · February 14, 2025, 8:06pm

I’m not so sure it’s a download sample issue, but that can be tested by reduced sample.
One can also watch About → Show log → Live → Information for time of first download.
If that’s slow to arrive, it means something else was happening, and one guess is below.

That’s before the backup really gets going, I think. It’s probably similar to Verify files.
How big is the Destination, and how many versions? Both numbers are on home screen.

Every version gets verified by SQL, and a version with lots of blocks can take some time.
Did you set custom blocksize option? Do you know what Duplicati first did the backup?
Older Duplicati had a 100KB default blocksize, which got pretty slow for large backups…

EDIT:

About → Show log → Live → Profiling during the slow period might also catch some SQL,

        AND "A"."FilesetID" = 949
    ) G
LEFT OUTER JOIN
   "BlocklistHash" H
ON
   "H"."BlocksetID" = "G"."MetaBlocksetID"
ORDER BY
   "G"."Path", "H"."Index"

))

might be at the end of a large chunk, and you can see the FilesetID going up on each query.
There’s also an execution time on previous similar one if you scroll to go back in time a little.

starkita · February 17, 2025, 2:13pm

Hello, and thanks for both replies!

I did have an absurd amount of versions. 1k+

Yes, that’s too much for my needs, and I just toned it down by enabling the “intelligent versioning”.

I will observe for a bit, and see what’s going to happen

starkita · February 19, 2025, 8:55am

Well, that was it.

I lowered versions to a reasonable amount, and time became very reasonable.

Thanks!