Backup Slows Down In Process

Just installed Duplicati today and running my first backups.
This a stand-alone machine backing up to a 5TB USB3 Seagate drive.
Windows 10 Pro (22H2)
HP-Z440 64GB RAM

The first backup set went well and completed in a reasonable amount of time. 20.82GB in 00:15:04

My second set is going much slower, it is significantly larger 1.32TB, 73,172 files, in 1,969 folders. What is troubling is that the throughput is degrading consistently through the process. It was transferring 50MB/s at first and with 57,000 files remaining it is down to 19.45 MB/s and has been running for 7 hours. With only 22% of the files completed and down to less than half of the throughput, I’m wondering if this will complete. I’ll leave it running overnight and hopefully it will.

  • No encryption
  • Volume Size: 100MB
    Thanks in advance!

::next day…
18 hours into the backup. Throughput seems to have settled at 17 MB/s. 15K files remain.
Will a different Volume Size affect this?
Is there a way to commit more RAM to the process? And would that help.
I see some conversation about Block Size, should that be adjusted? File sizes are mostly around 25MB each.

I wanted to run these to my NAS as a parallel backup. But that looks like it could take days to get started and the ability to restore from there seems dubious.

Hi @mbehrens, welcome to the forum :waving_hand:

Most likely not. This is just the size of the generated zip files.

We (well, mostly @carljohnsen) have looked into performance in this area and it looks like there is a point where inserts into the SQLite database becomes slower and slower. It makes sense that you only see this for the larger backup because the problem is not there until a certain size.

What you can do is try with a larger --blocksize which is the internal unit of data that Duplicati processes. If you set this to something like --blocksize=10mb it will make the number of hash entries in the database 10x smaller than the current default (provided most files are +10mb in size).

I keep pointing out (based on Process Monitor activity) that Duplicati seems to thrash the SQLite page cache, which defaults to a mere 2 MB of cache for maybe a 4 GB database.

CUSTOMSQLITEOPTIONS_DUPLICATI=cache_size=-<cache-in-KB> (note minus sign there) can increase this and make the paging activity less. I hope someone’s testiing this angle, preferably not on something that hides it by being too fast. I had my database on my own 5TB Seagate USB drive for awhile, thinking two spindles (C: is the other) can load-share.

Actually the portable drive kind of busies out, and it’s not from rare destination file writing. I’m not sure what cache I had then, but it might have been default, as a Windows service can’t have environment variables changed in it fast in Windows. Duplicati has now added:

Preload settings which can nicely make desired changes to Environment variables - env, however adding cache can backfire on a memory limited system. Mine’s 32 GB – and full.

My last test had 100 MB of cache using this preload.json file put in the database folder.

{
  "env": {
    "*": {
   },
   "tray": {
    },
    "server": {
        "CUSTOMSQLITEOPTIONS_DUPLICATI": "cache_size=-100000"
    }
  },

  "db": {
    "server": {
     }
  },

  "args": {
  }
}

I’m not sure which table most affects the cache. Anything blocksize oriented is suspect.

For non-initial backups, scan speed is probably also slowed if the file information gets big, requiring holding a lot of names in cache. Maybe the prefix table plan helps? I don’t know.

Thanks for the tips. Changing the blocksize will require me to recreate the backup set. Since the initial backup is completed and the incrementals are running in reasonable amount of time, I’ll keep this in mind for now. If I have to do a restore I understand that this backup set may take a while to complete.

Good point. This can also improve things and is a fairly simple change to apply. Maybe we should even expose this as a setting?

I see you put in the feature request already, along with a start at an auto-tuning scheme.

Things that need manual tuning add user work and support load. Either way needs plan.

If someone prefers to test on Linux, strace with -P path may sub for Process Monitor.

I was just looking for suspicious 4 KB (the default page size) I/Os to database or journal.

SQLite C Interface

has some SQLITE_DBSTATUS_CACHE_* metrics (and more) to study if C# can find a way in.
Putting it in product might be hard, but possibly, someone could write a C tool for dev test.

SQLite allows memory mapped I/O, but it has some drawbacks compared to read + write.

Architecture of SQLite

pictorially summarizes the area I’m worrying about, what it calls the Backend consisting of B-Tree, Pager, and OS interface. Someone can check, but the theory to me is that B-Tree needs pages, and if they’re not in the cache, it goes to OS which has its own cache, but if that fails, then that actually hits a drive. Giving more page cache might also hit a drive, as it’s probably virtual memory, so has the potential to busy (example) the Windows page file.

What to assume for drive speed is a good question. Trend for small systems is SSDs, but that’s still costly in large sizes, so it’s a question of what systems Duplicati should support.

I’ll note that Duplicati is pretty slow for me compared to other backups on this old desktop.