Does Duplicati Overwrite .ZIP Files in Storage

handyguy · January 22, 2023, 2:07pm

Hi all,

I am trying to analyze storage costs between my current provider (Backblaze B2) and AWS S3 tiers. A question came as to how Duplicati manages files on the backend. When changes need to be made to the backend .ZIP files, does Duplicati modify the existing .ZIP files with new data or does it just create new .ZIP files with the new/updated backup data and invalidate the old ZIP files in the database (and/or delete the old .ZIP files)? This becomes important for analyzing storage costs in S3:

S3 Standard-IA and S3 One Zone-IA storage: … Objects that are deleted, overwritten, or transitioned to a different storage class before 30 days will incur the normal storage usage charge plus a pro-rated charge for the remainder of the 30-day minimum…

So, essentially, for S3’s long-term (reduced cost) storage options, if a file is modified it may affect how it is billed (i.e., ++$$). Any insights the Duplicati gurus can provide on this issue?

[Edited] Topic is similar to Most cost efficient S3 storage class for Smart Backup Retention and Different/multiple AWS S3 storage classes, but seeking a bit more clarification on the mechanics of how Duplicati works on the back end. Thx.

gpatel-fr · January 22, 2023, 2:52pm

Hello

no guru here, but coming from the holy text itself, page 2 (https://duplicati.com/assets/Block-basedstorageformat.pdf):

Since Duplicati is based on this concept of a “dumb” storage backend, Duplicati has to cope with some restrictions. For instance, it is not possible to modify existing files which makes it impossible to update an old existing backup. To update old data, Duplicati has to upload new data and delete the old data.

handyguy · January 22, 2023, 2:59pm

Serves me right for not being more familiar with Scripture. Thanks @gpatel-fr!

ts678 · January 22, 2023, 3:47pm

That nice broad answer even covers unusual things like upload retries and The PURGE command. Typically, the frequent activities are backups, deletes per retention, and compact of remaining data.

There are some subtle points which one can glean from the author’s doc, but also from the manual.

Not immediately because typically some (or most) of the old data is still in either the new backup as unchanged files, or as the contents of a previous backup that is still around due to chosen retention.

Compacting files at the backend

When a predefined percentage of a volume is used by obsolete backups, the volume is downloaded, old blocks are removed and blocks that are still in use are recompressed and re-encrypted.

The COMPACT command

Old data is not deleted immediately as in most cases only small parts of a dblock file are old data. When the amount of old data in a dblock file grows it might be worth to replace it.

When compact collects partly filled dblocks, new one gets new name and old ones get deleted.
If some provider charges for download or upload, factor that in as well. The math gets complex.

Deleting files - excessive download said a lot about how compact works, and trying to optimize.
Cost optimisation on wasabi had some other thoughts on dealing with a retention charge policy.
That one was potentially a bit worse than a minimum duration, if it interacted with other policies.

ts678 · January 22, 2023, 10:06pm

You might even be able to script (maybe Python) a historical analysis using data similar to that seen in:

<job> → Show log → Remote which shows destination put, get, and delete with times and sizes, but

is easier, and File → Export looks like it can export the table as CSV, to put into your analysis software.

If this is too much, look at BackendStatistics in Complete log in a job log, but it won’t say how long individual files remained. You’d have to match file names in order to factor in a minimum charge length.