After two weeks of trying to recreate my backup, Duplicati is purging its own remote files. I give up

As I sit here, watching the file size of my Backblaze B2 bucket dwindle to zero in another tab, I thought I might put this time to good use by typing up my cautionary tale to anyone else who might dare to use this product for large datasets.

Like many of you, I’m sure, I came across Duplicati when searching for open-source backup solutions. I didn’t want to be tied into a proprietary backup format — risking my files being irretrievably locked away at the end of a product lifecycle — and I didn’t want to trust the security of any cloud provider. This led me to Backblaze’s product, some heated discussions about the lack of true client-side encryption, and finally clued me into B2 as a cost-effective backup storage if I wanted to roll my own. But backups are complicated and easy to get wrong, so I didn’t want to roll my own. Duplicati was often mentioned as an open-source alternative to the Backblaze client in these discussions.

At first, Duplicati seemed like the backup solution of my dreams. A mature web interface, a fire-and-forget scheduling mechanism, client-side encryption, and the ability to target every major cloud storage provider. I used it to back up my desktop, my laptop, even a few servers with the surprisingly flexible command-line interface. Indeed, after setting up Duplicati, it saved my hide for real on at least one occasion, and the test restores I did were a bit slow but otherwise not a problem.

Duplicati could seem to do no wrong. But there was something in common with all the backups I had successfully tested so far: they were all relatively small, under a hundred gigs or so. I assumed Duplicati’s speed would scale linearly with larger backup sizes, so I decided to entrust it with my most valuable data store: my NAS. And that’s where all the trouble began.

My NAS has 4TB of data on it. The makeup of the data is varied: a lot of raw video, various documents, pictures, backups of old projects. So I expected that performing backups and restores might take a while, but nothing like what I’ve experienced.

When I added the data to Duplicati piece by piece, a few hundred gigs at a time, everything was fine. Nightly backups went on without a hitch for good two or three months. Until one day, the Duplicati database spontaneously corrupted itself.

This, by itself, wasn’t all that surprising to me. Duplicati uses a SQLite database, and I’ve had those spontaneously corrupt in the past when using it for my own applications. To SQLite’s credit, it does a good job of defending against these circumstances, but ultimately it’s all sitting in one file and doesn’t offer the robustness of a more scalable SQL database like PostgreSQL. Indeed, with my NAS’s backup database sitting at eight gigabytes, I was surprised it was running as well for as long as it had. No worries, I thought; Duplicati is supposed to be able to restore files even without a database, so it might take a few hours to get back up and running but everything should return to normal in a day or two at most.

First I tried repairing the database through the web UI. It ran through a long process but ultimately failed. I tried re-running the backup, and it failed with the same database error as previous. So I connected to the NAS, moved the database to a .bak file, and let Duplicati perform a database recreate.

Only that recreate would never finish. I let it go for two days and was starting to get nervous running without a backup for so long. I stopped it and re-started it, again from scratch, and kept an eye on the logs as it built. And what did I see? I saw Duplicati choking for more than two hours on SQLite inserts!

At this point I started searching around for answers, and found stories of others who had let their database recreates go on for days, even weeks, and still not gotten a successful database at the end of it. This was horrifying! If something had happened to my NAS, it would have left me with a total inability to restore my files in a reasonable amount of time, if at all.

So, after waiting a few more days, and with a heavy heart, I wiped the entire backup from B2 and started fresh.

This time I decided to treat the SQLite database with as much reverence as the encryption key itself. “If I don’t keep this safe,” I thought to myself, “I’ll never be able to access my data, backup or no backup.” I set up a cron job on the NAS to keep daily versions of the SQLite database for up to a week, and then I set up a cron job on another computer to regularly sync the SQLite off the NAS, in case anything ever happened to it. This time, not even a database corruption would stop me from keeping my data secure.

Interlude one: By this point, I figured I was never going to get a quick resolution to my backup problems, so I ordered a spare 5TB drive to use for local NAS backups. I’m using Synology’s Hyper Backup for that and it has gone without a hitch so far. (I’m trying to avoid using Hyper Backup for B2 because, as I said at the start, I don’t want to rely on proprietary backup formats.) If nothing else, at least I was safe from an errant rm -rf while I worked out my Duplicati strategy.

Interlude two: Just to demonstrate how determined I was to solve this problem, I actually built a Linux workstation to handle the initial backup recreation. I noticed Duplicati was CPU-bound on my NAS, so I purchased a Ryzen 3950X for this purpose (and other multithreaded workloads I want to run in the future besides this). Unfortunately, even this didn’t solve my performance problems — Duplicati can’t seem to keep sixteen threads fed for encryption or hashing purposes — but that’s another matter. I just want you to understand that, by this point, I was desperate to get my backups in a healthy state again.

Back to trying to recreate the 4TB backup. Once again I went back to inserting my data a few hundred gigs at a time. Duplicati would take a day or two for each chunk of data, finish off the backup, and then at some point within the next 24 hours the SQLite backup scripts would sync the database safely off the NAS. Now if, at any point in the recreation process, the database were to spontaneously corrupt itself again, I would just restore the old database and move on. For the first time in weeks, I felt like I actually had something that might function long-term. If I could just take care of Duplicati’s database, maybe Duplicati would take care of my files, and everything would work out.

Eventually I stopped adding a few hundred gigs at a time. I discovered that if I stopped filtering out paths in Duplicati and occasionally stopped the backup, it would still complete the backup, allow me to copy the SQLite database, and then I could start it up again. In this way, it would give me safe snapshots every few days, and I wouldn’t have to reconfigure Duplicati every time I wanted to add another path.

I guess this is where I went wrong, because after the most recent of these backup stoppages, Duplicati has taken it upon itself to start erasing every single file in the backend. I don’t know why, and at this point, I’m too tired to try and troubleshoot. All I know is that, as I finish typing out this story, the 3TB of data that I finally managed to get back to B2 are being erased. I could stop Duplicati, but what’s the point? It will just put the database in a strange state again anyway. And if I restore it, doubtlessly it would make the same decision at the conclusion of the next backup.

So that’s it. That’s how Duplicati took a month away from me, and how, after two weeks of troubleshooting and babysitting a single backup job to try to get it back to normal, I’m finally broken enough not to continue. I’ll go back to Synology Hyper Backup, point it at my B2 bucket, and actually get some sleep tonight.

Duplicati is still my backup solution of choice for anything under about 100GB, but anything over it? It’s not worth the headache. Even if you can get it to successfully back everything up, you absolutely must keep snapshots of the database yourself, otherwise your data will only be restorable in a useless, theoretical fashion. Don’t be like me; don’t waste your time.

If any of the developers happen across this tale of woe, I implore you to try performing a database repair with a simple 1TB backup yourself and seeing what happens. If anyone cares to try it out, I’m on version 2.0.5.1_beta_2020-01-18. But I probably won’t be around to assist in debugging; Duplicati has stolen too much time and mental energy from me as it is.

Thanks for reading, and if anyone else is trying to maintain a multi-terabyte Duplicati backup that hasn’t gone wrong yet, all I can say is, good luck to you.

That’s a new one to me. Do you recall what status bar said then? Did it leave you any logs (or files…)?

“Starting” to erase is completely normal during Compacting files at the backend done after the backup. Wiping out the backup is not, so I’ll gently ask for any easy recollections to see if anything gets started.

Do you recall if you use the blocksize option? The default 100 KB should possibly be increased by you (until it’s increased by code) to avoid what anecdotally and via initial testing seems to be a pretty steep slowdown at larger backups. I know what slow SQL operation I see, but I’m not certain it’s the only one.

To ease large-backup test on the 250 GB drive I had at the time, I tested with tiny blocksize (e.g. 1 KB).
Whether or not this is valid, I can’t say, but it did seem to support the thought that lots of blocks is slow.

If anybody out there is good with SQL, I think your skills would be very useful in helping scale-with-size. There’s also a chance that there’s some SQLite-specific tuning, for example if page cache is too small.