Cold Storage support (service providers with storage tiers)

Sami_Lehtinen · March 1, 2020, 6:42am

I did find the old discussion about Cold Storage options, but it didn’t provide resolution. Because the discussion is over two years old, I decide to start a new thread.

Old ref: Use Duplicati with a cold cloud storage

Scaleway now provides cold storage option, and I’m just curious about the benefits of using it.

There’s free space + on top of that pricing is €0.002/GB/month. This seems to be pretty good. It’s also worth of mentioning that the transfers between cold storage and normal storage are free.

Unfortunately I don’t currently have time to properly survey this service and suitability right now. That’s why this post is kind of stub. I just wanted to bring this rock bottom pricing (?) to attention of users of this from. So if this service seems to be viable option, then it’s absolutely great. Some of the requirements have been already discussed in the previous thread. Like no auto-compact. Another question is of course if the transfer between normal and cold storage can be done automatically, and if the program could have some kind of support for it.

As example backup sets which haven’t been compacted yet, in normal storage, but full sized backups which are retained by retention policy for over N time units (or indefinitely), could be moved to cold storage or something like that.

Just dropping an idea here, this isn’t fully thought. - Thank you

Sami_Lehtinen · March 1, 2020, 7:09am

After thinking a while, this could be doable also using secondary script and automation. As I’m currently doing already.

Just wondering if there’s a call which would easily tell when restore job is run (with local database) which remote files are needed for it.

Also when backing up, after Duplicati is finished, script can easily send calls to cold storage data i and b files.

Of course this is simple of the cold storage is actually mirror or backup of backup. Because in that case, it just becomes standard object I/O operations, and even rclone might have already option to cold storage stuff. (haven’t checked that yet)

Arno500 · March 8, 2020, 7:12pm

After testing a bit, the backup is OK with the following options:

--backup-test-samples=0
--no-auto-compact=true

But for the restore, Duplicati needs to ask the server to restore the objects to the “STANDARD” class, which he does not. I think that would be the missing piece of the puzzle. Query the informations of the object, check its class, if it’s “GLACIER” then transition it, go to the next one, wait for all the objects to come back by polling periodically, and run the restoration when they’re retrieved.
When everything is restored, just put them back to Glacier.

As said earlier, transfers does not incur costs on Scaleway, so this would be a viable option I think (at only 0.002€/GB, that’s 5 times less than their S3 at 0.01€/GB which is probably the cheapest worldwide).

That would make a 1TB backup 2€/month, unbeatable. And pratically doable, the restore from Glacier should be done under 6 hours (depending on load, in my tests it was around 20 seconds after clicking the button).

Thanks still for making Duplicati, one of the best backup solution (and open source!)

Wim_Jansen · March 9, 2020, 7:55am

I think rclone transitions the files out of glacier, so if you configure rclone as your backend, it might work.

Arno500 · March 9, 2020, 8:32am

Actually the rclone doc says specifically that they does not ask for restore from Glacier (https://rclone.org/s3/#glacier-and-glacier-deep-archive)

So you still need a file list and do a manual file restore for each file.

Wim_Jansen · March 9, 2020, 1:29pm

You are right… I was thinking: as they can store to glacier immediately, it will work the other way around as well, which is not the case.

On a side note, I have uploaded data to scaleway in glacier, but my storage costs seems to go up higher than the advertised 0.002euro/gb/month: 780GB for a week and now 1.06 euro storage costs.

Sami_Lehtinen · March 9, 2020, 2:34pm

Because there’s unfreeze (to differentiate from restore (duplicate)) latency, best approach is anyway to initiate unfreezing of all files of all files in batch, and then start downloading and restoring. If the approach is more like, unfreeze, download, remove, restore. Sure it consumes less normal cloud storage space, but if every restore step takes 2 average of minutes, it can lengthen the whole data restoration task a lot.

It’s better to ask all files in batch to be unfrozen from glacier and and then download those when those become available.

Yet adding these calls API calls directly to Duplicati shouldn’t be too complex, if someone really needs these features. - I personally would just run a small Python script with reference to the version being restored, and select the files from the Duplicati’s SQLite3 database directly.

Start with Fileset and a few joins to end up with Remotevolumes. Then just let the API calls for restore to roll. Script like that, probably around 100 rows in Python, if not using other libs than http.client and sqlite. Definitely doable if required.

Arno500 · March 9, 2020, 9:07pm

Hum, they may have moved you data after some time. Are you sure everything is in glacier?
They’re not especially “moneygrabbers” when it comes to billing, though there may be some conditions on it.
You can check the pricing details here: Object Storage - C14 Cold Storage Class - Scaleway

Arno500 · March 9, 2020, 9:10pm

Of course, batching would be way better. Also you can set an expiry date before the object return by itself in Glacier (that may be interesting to prevent some unconsistent state where some objects were not restored).
This is easy to get the status of the restoration on an object, the Amazon API provides the “restore” header with informations (documented here: Object Storage - C14 Cold Storage Class - Scaleway). Integrating all of this into Duplicati would make it more “seamless” and integrable easily into small environments (we were thinking to backup a Synology NAS).