S3 - Glacier backup failed to restore but restored files anyway

ts678 · July 30, 2024, 5:40pm

This is a bad combination for any complex hazardous schemes. I’m good at Duplicati but not AWS. The person giving advice in the other thread is good at Duplicati. and likely better than me at AWS.

What about ease and costs of restore, short of a full local disaster? Do you expect small restores?

Cold storage is inherently harder to work with, and Duplicati adds to that by not supporting the API for Glacier restore, meaning get files out of cold storage and into something that can be accessed.

Disaster recovery is probably urgent enough that you restore-unfreeze all the files, needed or not. Destination files may become not needed if the entire dblock file has blocks of files not in backup.

Ordinarily the waste would build up, so compact would remove versions and any blocks not used. Cold storage gives this up, so you wind up growing storage use forever and slowing the database.

Cold storage is less expensive, but an unbounded amount of it will run the costs high – eventually.

The “this backup will move to GLACIER” might express the desire, but as mentioned before, there generally aren’t backup versions that stand by themselves. They share data and this saves space.

“little change … from one week to another” means you’ll have a large initial backup, then a weekly backup will probably upload only a little, referencing the older (maybe even initial) backup for data.

People come by the forum asking to delete old files, not realizing that the data blocks are used by current files, when possible. Blocks only turn to waste when no retained version is still using them.

The idea that you can force just hot storage to be used on restore only works if you are certain the data in the file has no blocks that currently exist in the backup and have been sent to cold storage. There is actually another thing that may help, but the no-local-blocks default may change. It’s now:

--no-local-blocks = false
Duplicati will attempt to use data from source files to minimize the amount of downloaded data. Use this option to skip this optimization and only use remote data.

This has the advantage of simplicity and not having to wonder whether restore data is hot or cold. You can also do normal retention and compacts. Basically, it’s a local backup plus a remote copy, possibly just an rclone sync in run-script-after. There might be files that get deleted before getting minimum retention days, but I don’t “think” it’s a penalty in the way it might land on Wasabai. See:

As mentioned, Duplicati backup versions interconnect, so your “just in case” might be broken too. There are recovery tools for slightly damaged backups, but for best safety, use multiple backups – and the more independent the better. A local Duplicati backup and remote clone isn’t independent from a backup integrity point of view, but having a remote copy at all guards against local hazards.

If you’re throwing files onto the remote, I suppose you could throw the local database too, though there’s an argument that you can usually (maybe slowly) recreate the database from remote files.

I mention the database because having all the dlist and dindex files in cold storage will prevent an occasional test that a direct restore from backup files or similar database recreate actually works.

Good practices for well-maintained backups are my suggestions. I don’t know your level of worry.

I don’t know how big this backup is. It sounds large enough that storage cost is worth some worry. Generally, anything over 100 GB should increase blocksize in proportion to avoid SQL slowdowns because of too many blocks. Never cleaning up versions or waste also leads to blocks building up.