Retention question

ts678 · September 18, 2018, 2:19am

Hello @mshillam and welcome to the forum!

When a backup version is deleted, that “view” of all the files (at some point in time) is no longer available for restore, however more recent views (and their files) remain. Duplicati’s Block-based storage engine doesn’t do merges because any version of a file is represented as a list of blocks, and its list may change over time. When a block is no longer in use at all, its storage at the destination is reclaimed by automatic compactions.

Basically you’re uploading file deltas, but you might not even upload a block if it’s there from some other file.

A version is basically a list of all your files, plus for every file a list of all its blocks (which are known by their hash). How the backup process works discusses this, and you can also see How the restore process works, and Choosing sizes in Duplicati. Files that are massively changed tend to defeat block-based deduplication and the delta-upload plan, whereas files that change repeatedly can find their blocks scattered, requiring a download of many dblock files (maybe containing irrelevant blocks) to collect the blocks for the restored file.

How huge are the datasets? Duplicati is sometimes not so fast on huge datasets because (the theory goes) the local SQLite database that tracks all the pieces slows down (e.g. on inserts) as the tables become huge. Losing the database (e.g. by disk crash) can mean a lengthy rebuild from the information at the destination.

Hearing TB of data makes me a bit nervous, although you can read Best practice for large data set (18TB)?