S3 buckets, "smart" retention, and old data files

ts678 · June 24, 2022, 12:00pm

This will destroy your backup, as noted. Don’t do it.

Incremental backups
Duplicati performs a full backup initially. Afterwards, Duplicati updates the initial backup by adding the changed data only. That means, if only tiny parts of a huge file have changed, only those tiny parts are added to the backup. This saves time and space and the backup size usually grows slowly.

This means that those old files still contain substantial amounts of old data, maybe your original backup which later backups build on as a base. Wasted space is eventually cleaned up (unless you disabled it).

Compacting files at the backend

Upload volumes (files at the backend) likely contain blocks that do belong to old backups only, as well as blocks that are used by newer backups. Because the contents of these volumes are partly needed, they cannot be deleted, resulting in unnecessary allocated storage capacity.

The compacting process takes care of this. When a predefined percentage of a volume is used by obsolete backups, the volume is downloaded, old blocks are removed and blocks that are still in use are recompressed and re-encrypted. The smaller volume without obsolete contents is uploaded and the original volume is deleted, freeing up storage capacity at the backend.

EDIT:

Yes. A version deletion deletes dlist. Compact removes wasted space, changing dblock and dlist files.

Because it can be very time and bandwidth consuming to try to compact the entire backup every time.
The COMPACT command does give you some controls if you really dislike the default compact levels.

How the backup process works gets technical, but covers some of the details of deduplication method.

Finally, the file C:\data\extra\samevideo.mp4 is processed. Duplicati will treat each block individually, but figure out that it has already made a backup of this block and not emit it to the dblock file. After all 3 blocks are computed, it will then create a new block to store these 3 hashes, but also finds that such a block is already stored as well.

This approach is also known as deduplication ensuring that each “chunk” of data is stored just only once. With this approach, duplicate files are detected regardless of their names or locations.

Here, “already made a backup of this block” refers to it being in a previous (maybe very old) dblock file.