Greetings,
tl;dr - The lack of visibility and feedback is killing Duplicati and making it hard to use for large scale backups. The use of sqlite is also problematic, because it doubles down on the lack of visibility into the work actually being done.
Rant++ follows:
I started trying to use Duplicati recently, backing up a low-end dedicated server to BackBlaze B2. My first attempt was an unmitigated disaster. I gave it everything it should back up, it decided there was ~1.6TB when there’s only ~1.2TB, across ~5million files. Which is about right. But it backed up so incredibly slowly, that eventually…it…slowed…to…zero…bytes…and finally I had to kill it, because it was running 99.6% of one CPU, and there were no logs in the UI, no data transfers, no updates…nothing.
Fine, I read some folks saying that it can’t handle that, so I split it up into a bunch of smaller jobs. Okay, so the jobs run in order you created them, if they’re scheduled and past-due, and the first job was my old one, and so it failed again. I’m getting really used to systemctl restart duplicati
. Around here is where I learned that I have to tell it not to immediately restart running jobs, and spend ~2 minutes paused once it starts up, otherwise it just locks up again.
Okay, I’ve descheduled all the jobs, and made them run-on-demand only. Great! One of the jobs seems to have finished…only…it…slowed…to…0 bytes per second…at…the…end. Again, no feedback, no information, no data transfers happening, just 99.6% cpu.
Okay. Let’s dig some more. I found out that doing analyze
on the sqlite3 database might speed things up. Let’s try that. HEY! IT WORKED! That backup completed, albeit only ~13GB of data…but…the…second…job…did the same thing. Okay, so restart duplicati, analyze
the database, and…hey, it went much faster!
Only now it sits at ‘finishing backup…’ with…you guessed it, no network traffic, no log files, no UI updates. So I bounced it, tried analyze
again, and when it started up…
Completing previous backup …
Okay, what’s it doing? No. Idea.
Fine, let’s turn on logging to a file, set it to verbose, turn on profiling (whoops, that turns off verbose!) and see what it’s doing in detail. Only no, it’s not…logging…anything. Six. Hours. Nothing logged, >99% CPU usage on one core, and nothing logged.
We’re talking about one job, 126GB, 201,048 files, and it can’t be arsed to tell me what the hell it’s doing?
Last message I got before the 6 hour delay:
[Information-Duplicati.Library.Main.Operation.FilelistProcessor-KeepIncompleteFile]: keeping protected incomplete remote file listed as Temporary: duplicati-20240416T040635Z.dlist.zip.aes
[Information-Duplicati.Library.Main.Operation.Backup.UploadSyntheticFilelist-PreviousBackupFilelistUpload]: Uploading filelist from previous interrupted backup
Okay, but…there’s no network traffic. It’s not doing anything. It’s not logging anything. I don’t know what’s going wrong, because it’s not logged, sent to the UI, or tracked in any way.
I’ve been fighting with this for days, and I’m exhausted. gdb -p {pid}
and thread apply all bt
gives me this, among other threads:
Thread 16 (Thread 0x7f13b26dd700 (LWP 9284)):
#0 0x00007f13b2f9a174 in sqlite3VdbeExec () from /lib64/libsqlite3.so.0
#1 0x00007f13b2f9dcff in sqlite3_step () from /lib64/libsqlite3.so.0
#2 0x000000004187d570 in ?? ()
#3 0x00007f13ba5408d8 in ?? ()
#4 0x0000000000007530 in ?? ()
#5 0x00007f13ba540970 in ?? ()
#6 0x00007f137810ebf0 in ?? ()
#7 0x00007f13ba540970 in ?? ()
#8 0x00007f139c000fe0 in ?? ()
#9 0x0000000000000000 in ?? ()
which, given that all the other threads are various forms of do_futex_wait
and pthread_cond_wait
(i.e. waiting on other things to happen) and that’s the only thread that is obviously attempting to do something currently, that it’s slammed by some kind of a sqlite3
operation. If this was postgresql
or something else, I could log in to the shared server and ask what the top queries are, but it’s not.
There’s no visibility. No feedback. The logging and UI are not helpful, because (apparently) sqlite just…locks up, and consumes a single core of CPU. Forever.
What can I do to help fix this? I like the UI, clunky as it is, and I like its intention, and everything else seems so much worse, but it’s just not able to handle large backups, and it needs to be able to, because the data folks store is only getting bigger.
– Morgan
Edit: For reference, it’s not running out of memory, or swapping. The Duplicati.server.exe
process is running at 0.5% of memory, and I have over 20GB of RAM still unused on the server. It’s just in a truly awful, beyond terrible SQL query, that it somehow managed to not log before running it.