Some notes on performance

Just collecting some notes on performance based on my experience with Duplicati over the last few years, in case they’re useful to others. These relate to various versions, up to (and mainly on) 2.0.5.1_beta_2020-01-18, with a terabyte or two of data backed up to Backblaze B2. This backup now runs in 3 to 5 hours, with Duplicati hosted on a mechanical hard disc.

Caveat: I’m just a user: there may be wrong information in here — corrections gratefully received!

  • To set the scene, remember that Duplicati actually puts data in two or three places: (1) your backup destination; (2) its local database, which indexes the backup and can (slowly) be rebuilt from scratch by scanning the backup destination if needed, and (3) its temporary directory. For very large backups you’ll need to make sure that the temporary directory has quite a lot of free space, and you may get some benefit from having it on a separate drive to the local database. Use the TMPDIR environment variable to modify this: sqlite3 doesn’t respond to Duplicati’s --tempdir switch.
  • So, the first real performance note is that if you’re doing anything intensive and once-off on the Duplicati local database, then if possible do it on an SSD. Jobs like sqlite3’s “pragma quick_check” run twenty times faster on SSD than on mechanical hard disc, so the effort of moving your files to an SSD and back to do some maintenance is well worth it. I’ve had database repairs run for over four weeks on mechanical disc (that’s a 26GB database indexing 2TB of backups, mind).
  • If your backups are taking longer and longer to run, do check your retention policy: it’s nice to keep everything forever, but Duplicati isn’t really designed to do that. By default, the metadata for each backup fileset (the result of one backup run) is checked for correctness before each backup run, which means that an archive with a year’s nightly backups in it will take 52 times longer to run this check than an archive with only a week’s backups. Your two options here are to throw away some of your older backups by enabling a retention policy, or to disable the filelist consistency check. The latter is not recommended, but if you do disable it on the nightly backups, make sure to run it manually every now and then (Advanced → Verify files in the GUI).
  • During normal operation, backup times tend to be bimodal (mostly quick but sometimes a lot longer) because of compaction checks, during which Duplicati looks for opportunities to save space by merging file blocks in the backup destination. These can take from an extra 50% to many times longer than a typical backup run, depending on your retention policy (see previous point). You can manage them somewhat by reducing the frequency of compaction checks (set auto-compaction-interval to ensure that they happen no more frequently than, say, once a week), or they can be disabled and left for manual activation.
  • Another occasional bookkeeping task which can take a long time (a day or two on my 26GB database) — but save you even more — is database vacuuming. The first time I ran a vacuum, it reduced local database size by 16% and reduced backup runtime by 75%, so it’s at least worth doing periodically! You can however safely set the auto-vacuum-frequency to a fairly large interval (I do it monthly). I believe that by default vacuuming is not enabled.
4 Likes