Backup valid, but still unrestorable?

Just a few things I want to add, about Duplicati:

  1. I’ve never complained about hash mismatches.
  2. Biggest annoyance is the fact that test passes, but restore fails. - This is something which definitely needs to be addressed. Because if test passes, afaik, restore should work too (!). And that can’t be blamed on durability / bit-rot.
  3. Duplicati is around 98% at the goal, but the very small issues ruin it.
  4. I also reset all Duplicati backup sets, because some are several years old and some of the issues could have been caused with older versions. Without reset, it wouldn’t be fair.

Restic:

  1. I assume Restic will perform bit better
  2. I’m worried about Restic memory footprint, the environment I’m running the backups got large pretty stale data sets, so Restic could cause memory issues. This could ruin 1.
  3. Restic is very nicely wrapped in single binary, which makes it wonderful thing to just drop it on servers with script.
  4. On some (rare) cases for me, I assume deduplication with variable block size to improve deduplication a lot. But this applies just to a few very specific data sets.
  5. On the other hand the point 4. could get ruined by the fact that I’ve read that large files usually get rechunked bit later again. Which could make process more inefficient. Because in these cases it’s about exactly one very large file getting backed up.
  6. I’ll use Restic with rest-server. Data is stored on the files system as Duplicati backups.
  7. Last edit for now, I’ve configured everything, first backup is done. New backups will be created ever 6 hours and data retention (forget with --prune) is configured to be very similar to the Duplicati’s retention, yet not exactly the same.
  8. Restic creates a lot of quite small chunks, well, pretty similar to Duplicati’s default. Not a great fit for SMR storage / cloud storage where there might be latency on per file access.
  9. I also miss the Duplicati’s threshold option. Now when it forgets something, it also prunes at the same time the stuff out of the DB. Depending on environment and situation this is a good / bad approach. In my case, I don’t see any benefits from immediate prune, because we’ve got plenty of basically free storage. Pruning (purge / compact) just adds extra I/O if it’s done too often.

But I’ll collect data, and configure logging so that I’ve got full logs for whole 6 months period, so if there’s something I need to review, I can dig the data.

1 Like