Understanding Deduplication

frood · November 6, 2018, 1:14pm

The fact sheet states:

Duplicati analyzes the content of files and stores data blocks. Due to that, Duplicati will find duplicate files and similar content and store this only once in the backup.

Does deduplication occur at the block level or at the file level? If I have 2 very large files that only differ by a few bytes at the end, will block level dedup save space?

ts678 · November 6, 2018, 2:02pm

Deduplication is at block level, so local changes anywhere (including file end) only save updated blocks, which would save space, however as blocks go out of use (old backups age out), the unused space isn’t immediately reclaimed, but waits for an automatic-by-default version of The COMPACT command to run.

If you are saying you have two static files that only differ at the end, the backup of the second file would realize that the initial blocks already exist, courtesy of the first backup, and reference the blocks instead.

If you want more technical detail, click on the ARTICLES link on the fact sheet. Good ones might include:

Block-based storage engine

How the backup process works

How the restore process works