I’m running some tests to try to get an idea of what affects duplicati performance. Are there already data available on this forum, elsewhere, or some data that somebody has hidden in their personal archive and would like to share?
I’m curious about things that will affect backup experience. Such as
DB Recreate speed, as compared to local-database size (or other relevant metric, source set etc., block size). This is a VERY important parameter (I have discovered) as we don’t want our restore to take a year (I found a thread with a user who had 60 GB local db, and restored 1% a day, crazy since recreate gets slower towards the end, probably hdd based restore by my guess).
Backup speed with different block-sizes.
Backup-speed on empty local db, and backup-speed whene there are already tons of blocks to search through. The latter giving some insight into long-term price of having low block-size.
I’m contemplating collecting stuff like this and writing a much more thorough guide to what to check for new users with large back-sets (TB sizes).
I’m already collecting data based on my own tests, but I’m sure there’s tons out there. I’m especially interested in like AB-tests, where two different identical backups / restores have been made on the same hardware, so we can compare effect of settings and local db size.
Probably the biggest thing is your deduplication block size. The default of 100KB is too small if your backup is large. The rule of thumb some of us on the forum use is the block size should be about 1/1MB the size of the files you are protecting. In other words, a 800GB backup should probably have a block size of 800KB (I’d probably just round up to 1MB). A 2TB backup should probably use 2MB block size. A larger dedup block size reduces the number of blocks that have to be tracked, speeding up database operations but reduces deduplication efficiency somewhat.
The second biggest factor IMO is the number of backup versions. Pruning versions helps keep the database leaner, also speeding up database operations.
If this is unbearably slow, it’s probably because Duplicati is downloading dblocks. It should not need to do this for a recreate, but due to a bug in some older versions it may happen. My theory is that dindex files were written incorrectly (by those dastardly certain old versions), and when you do a recreate those incorrect dindex files cause Duplicati to think it needs to check the dblock files.
The good news is that (in my experience) you can solve this by using the latest version and having Duplicati regenerate those dindex files from your local database. Without bad dindex files, the recreate process should not take very long.
Those block-size recommendations, were they gathered with the sqlite db running on SSD’s or HDD’s? It seems a very safe recommendation (and safety is the most important aspect), but my current data suggest we could safely use a bit smaller block-size. I have 500 GB of files on OneDrive, with a local db of 1.6 GiB, and it takes 2.5 hours on SSD to recreate the database. I would prefer faster recreate, so I would probably go with 250 KB block size if I was to redo the backup, but that is also pretty close to the ‘General Recommendation’ of 1/1.000.000. Ahh, if I also include an assumption of say double space used by multiple versions, then we actually get to the 500 KB block-size… Plus backups source sets tend to grow over time, and we cannot change block-size. Interesting that we arrive at around the same conclusion
Also I was surprised to notice that the bottleneck was with a factor of 4x the OneDrive upload speed. I did a backup of a 16GB file (which Duplicati does not try to compress). When running the following two tests, both took the same time (excluding the 11 minutes it took to get remote filelist for Pre/AfterBackupVerify)
100 KB block-size on old backup-set with 1.6 GiB local DB and 500 GB data
10 MB block-size, on new empty backup-setup (ie. empty local db, no previous blocks)
The two above took an hour to run, where as when running locally to a LAN it took 12-14 minutes depending on block-size. I have not tested LAN speed yet, when using a backup-set with a large DB.
I could also see that it was Upload bottlenecked by the tmp-dir having 8 full files all the time during the backup (running with 4 concurrent compressors).
I’ve only been using duplicati for a year or so, and I think I’ve always been using the latest beta-version. But that dblock info, does give me some insight into those older posts concerning slow restores.
Is it downloading dblocks when it does a recreate? You can monitor (while the recreate is running) by going to About → Show Log → Live → Verbose.
Dedupe block size probably won’t affect upload speed much. Remote volume size (default is 50MB) may, but in my limited experience it doesn’t really affect upload performance that much either.