Amazon Glacier Best Practices?

I’ve just started using duplicati in a linuxserver.io Docker container, with the web GUI.

What are the best practices for backing up to Amazon Glacier? I found this page, but it’s from 2013.

As far as I can tell:

  • Set the destination to back up to an S3 bucket
  • Don’t set the storage class to “GLACIER”, it does nothing.
  • In the AWS management console, set up a lifecycle rule for the bucket to move objects with names starting “duplicati-b” (but not others) to Glacier after a day or two.

Is this correct? Does it work well? Are there any other special options I need to add?

To the best of my knowledge, Glacier isn’t ideal because Duplicati needs to read the remote files for verifying, compacting, etc, and Glacier retrieval is too slow to work well with that. There are probably configurations within Duplicati that would allow you to use Glacier, but that is a significantly more involved process.

Well, that’s what I’m asking about. What are those configurations?

I just did this the other day using that old page you found. It worked ok. See Duplicati and S3 Glacier - Features - Duplicati

Thanks. Are there any problems with Duplicati trying to read the glaciered S3 objects?

Glacier doesn’t allow immediate read access to the objects. As such, Duplicati fails when it tries testing the files. I personally advise against using archive tier storage. (You can get hot storage at about the same cost as Glacier from other cloud providers such as Backblaze B2 or Wasabi.) Google does have an archive tier that allows for near immediate access, so that’s another option.

If you really, really want to use Glacier, then you should set these options:

--no-auto-compact = true
--no-backend-verification = true

These should get your backups to run without error.

But note that I don’t know how restores even work, since object availability on archive tier storage is measured in hours. I imagine Duplicati will fail on restores unless you move the objects to a higher tier. You should definitely test this thoroughly.

1 Like

That’s unfortunate. The 2013 article says “The new storage engine for Duplicati 2.0 was designed with support for Amazon Glacier in mind.”, but I guess it turned out to be harder than they thought?

Dunno… I guess it works if you use the two options I mentioned, but I don’t use it myself. I am curious how restores work when object availability can take up to 12 hours. Maybe someone who has done it can share their experience.

Running Duplicati - 2.0.7.1_beta_2023-05-25 and tried the two options above. Now get two warnings, “The supplied option --–no-backend-verification is not supported and will be ignored” and “The supplied option --–no-auto-compact is not supported and will be ignored

Another thread suggested changing --backup-test-samples to zero, will try that and post an update to see if it resolves the problem with AWS Glacier

Welcome to the forum @mewa

Options begin with two ordinary dashes (minus sign on the keyboard). You posted a third en dash. Sometimes the forum does this conversion so I’m not sure what you really have, but double check.

backup-test-samples has an effect only if the backend is verified in the first place, and no-backend-verification disables it.

Looks like when I copy-pasted the options, it erroneously introduced a third dash. I removed the options, carefully re-added them, and the error has disappeared.

Thank you!

As a future suggestion, perhaps the error message should read “invalid option” rather than “is not supported and will be ignored”? I assumed the latter meant that it was an option that was supported in the past or on a different platform…

Just wondering: Does the no-backend-verification option require access only to the files uploaded in the current backup, or are older files also needed?
I’m considering uploading the files to S3 and moving them to deep storage after 30 days.

Lack of no-backend-verification-option can potentially try to download files of any age.

Backup Test block selection logic explains, and the title is more relevant to this option:

–backup-test-samples (Integer): The number of samples to test after a backup
After a backup is completed, some (dblock, dindex, dlist) files from the remote backend are selected for verification. Use this option to change how many. If the option --backup-test-percentage is also
provided, the number of samples tested is the maximum implied by the two options. If this value is set to 0 or the option --no-backend-verification is set, no remote files are verified.
* default value: 1

I would prefer having this set to 0 (to not read Glacier class files that are inaccessible).

–no-backend-verification (Boolean): Do not query backend at startup
If this option is set, the local database is not compared to the remote filelist on startup. The intended usage for this option is to work correctly in cases where the filelisting is broken or
unavailable.
* default value: false

seems to me like it goes dangerously far, unless somebody knows that it’s needed for Glacier. Looking over the available files is important for integrity, as opposed to just hoping all is well.

I don’t use Glacier myself, so maybe other topics have comments, or some user can post here.

The --no-backend-verification disables both list-verification and test-downloads of files.

It has been a while since I looked deeply at Glacier, but as I recall, you need --no-backend-verification because moving files to Glacier removes them from the S3 bucket view, so when Duplicati lists the bucket it notices files are missing.

We would need a special list call to get the archive inventory list and combine it with the bucket response to support it better. With some more work, we could test only files that are in S3, so the verification would work before the files are moved to Glacier.

S3 deleting files in Glacier storage class (Stack Overflow)

says they can be deleted from the S3 bucket, implying they’re visible (not “blind” deletions).

Amazon S3 Glacier Deep Archive in Amazon S3 FAQs has a question that seems to cover:

Q: Are there minimum storage duration and minimum object storage charges for S3 Glacier Deep Archive?

talks about 8 KB Standard data fee per archived object which lets you use the S3 LIST API.

EDIT:

Downloading a Vault Inventory in Amazon S3 Glacier has a big note at its top suggesting it’s applicable only to customers using vaults and the original REST API from 2012. Glancing at documentation for S3 (bucket) use of Glacier, I have yet to find talk of getting vault inventory.
Possibly things are somewhat hidden from you by S3, except of course for actual data in file.

Ok, it sounds like S3 has developed the Glacier support a bit further than the original offering. We will look into better support for Glacier-type setups in the near future.