I’m currently comparing backup software like Duplicati, Kopia, restic and Duplicacy. Duplicati is the one with most negative reviews (e.g. Reddit). Although personally I find it pleasant to use, including its user-friendly GUI, extensive cloud storage support, as well as a friendly support community. The most common criticism of Duplicati is corruption of backups. Some observations:
Backup corruption seems to be mostly on Linux (including Docker). I’m seeing fewer complaints from people running Windows. This could of course be a statistical illusion or confirmation bias. I don’t know. I do however suspect this could have something to do with Duplicati being written in .NET and ported to Linux using Mono. Can’t also help but notice the large amount of files in the installation (700-1100) which makes it less elegant that a single binary solution (like restic or Duplicacy) because any of them, in theory, could get damaged.
Another hypothesis is the architecture of the software itself, such as the choice to use local databases. My understanding is that this is not the case with other programs such as Kopia and restic where the database is integrated in the repository. Devs say the local database is good for data saving in the cloud. Perhaps the database could be disabled for local backups where data cost isn’t an issue?
Anyway, what do the devs think of this? Just curious, cause I’d very much like Duplicati to succeed. In fact, it’s really awesome we’re even getting such software for free. Thanks!
I have not heard of actual backup corruption. What usually happens is that the local database ends up in a broken state. This does not prevent restoring anything, but Duplicati refuses to make new backups.
The reasons for these failures are hard to reproduce, but we have actually managed to get the most common ones fixed for the recent beta release.
With the latest beta, there is no more Mono, everything is compiled for the platform, and includes the runtime it was tested with. I have not heard of Mono causing corruption, but there has been reports of excessive memory usage and CPU usage.
Most of the files are small text files, including the html/js files for the UI. We could combine them during the build, but so far there has been no real requests for this.
I personally think the idea of having a single binary is neat, but has little practical value. In most situations you are most likely better off downloading from the website instead of copying. For most systems, using a package manager of sorts is also preferred.
Bit corruption could as easily happen with a large file, compared to a single large file.
I think that is based on preference and as such subjective. I do not have enough deep understanding of the other systems to give a fair comparison, but I can explain the reason for the local database.
Fundamentally, Duplicati does not trust the storage to work as it should. For that reason it keeps track of what the storage is supposed to look like, and complains loudly when things are not as wanted. This is based on my experience with storage providers and storage devices, and an attempt to give errors before you need to restore.
The rest of the database is essentially a cache file, but the database allows us to use the fast B+Tree for lookups with limited memory needs.
For local backups, you still need to keep a check on the storage to make sure you do not loose files from defect disks for instance.
If you just view the local database as a (structured) cache file, I don’t see it as so different from other solutions, but the distrust of storage capabilities is not something I have seen from others.
Thanks for sharing. I think on Reddit particularly, we have not been very active, so that might be some of it. It could also be the issues you mention with the Docker/Mono setup.
I’m not a dev but I volunteer a lot on the forum (help wanted), so I have a few opinions on it.
I’d be happy to remind you of some, but I agree data loss is rare. Noise and work is less so.
The way I usually put it here is that Duplicati keeps careful records in the database of what it expects to see, self-tests quite a lot, and sometimes complains. It’s an early warning system, although there are things it could still do better to avoid surprises at, say, database recreates.
Anything the check complains about is probably seen as corruption and will need some work.
Difficulty varies. Sometimes an obvious path like a Repair works. Next might be DB Recreate, however sometimes this fails, and at that time I would probably call it a corruption, despite the Duplicati.CommandLine.RecoveryTool which tends to forge ahead despite certain corruptions.
Recreate is pretty robust. Ideally it runs at an acceptable speed. Less ideally (given a problem where it can recover by trying hard), it’s slow. Sometimes it still can’t work. This is getting rare.
Beyond the more obvious buttons to push (or advice from the GUI), recovery sometimes does better given some expert advice, which is why forum exists. Situation is improving, not perfect.
Unusual cases still occur. https://usage-reporter.duplicati.com/ count is 65 million backups/year which can be compared against (say) forum reports. Far from a flood, maybe down to a trickle.
Local database corrupted, recreate possible, restore possible
Local database corrupted, recreate fails, restore possible
Remote data is corrupted, recreate fails, restore is not possible
Duplicati is designed to avoid ending up in category 3, so if there are any cases where Duplicati has produced damaged data, we need to prioritize those issues.
There has been types 1 and 2, but we added fixes for those in the latest beta. There are some issues where the internal format contains additional data that gives warnings, but this does not affect correctness. If you are aware of open issues in category 1 or 2 that are not on the list of pending issues, please do add them or make me aware.
Thank you for the response! It’s awesome to hear developers take Duplicati seriously! I’ve stepped up using the program recently and I may return with some (minor) suggestions soon!