Hello! I have a question that pertains to my usecase which I’d like some clarification regarding.
I will use duplicati to perform remote backups of my local document folder, with encryption enabled. On occasion I will reorganize some documents by moving them to a different subfolder.
I would like to know if this action will cause a duplicate copy of the document to be stored on the backup server, or if somehow duplicati can track a file’s path separately from it’s binary data such that I am not wasting server storage with duplicate file copies.
No, this will not result in duplicate copy of your file data on the remote side (assuming the old and new locations of that data file are within the same Duplicati backup job definition). This is thanks to the deduplication engine in Duplicati.
I originally missed this, but for anyone else:
Duplicati analyzes the content of files and stores data blocks. Due to that, Duplicati will find duplicate files and similar content and store this only once in the backup. As Duplicati analyzes the content of files it can handle situations very well if files and folders are moved or renamed. As the content does not change, the next backup will be tiny.
Follow-up question to this thread. When does the deduplication occur? Does it happen before the file is backed up (so no data transfer needs to happen) or after the file is copied to the backup site and compared with what’s already there?
It happens locally before the data is sent to the backup target. There’s no Duplicati code running at the backup target, your instance of Duplicati just reads/writes files from/to there. I believe the hashes used for the deduplication are stored in the sqlite database which is why it can get so large depending on the size you choose for your deduplication granularity.
@Lerrissirrel has it exactly right (including the size concern – a large backup can make a big database, which we encourage keeping below a few million blocks, and sometimes the blocksize needs increasing).
How the backup process works talks about deduplication. It scans the source and may see reused blocks. These don’t get uploaded again. The old block having the same content is referenced. No need for another.