Purge and dedup logic question

mobamoba · October 6, 2018, 12:58pm

I’m trying to understand the underlying logic of the purge command but am unclear how it interacts with Duplicati’s deduping.

Imagine this circumstance:
I have a file on drive C: called C:\File1.txt.
For a whatever reasons, I have the exact same file on drive D: - D:\File1.txt
I tell Duplicati to backup both drives C: and D:
I then do a purge of C:\File1.txt

What happens?

Since Duplicati natively dedupes, it would have pointers to the same file on both C: and D:. However, since I only purged it from one drive, does it merely remove the C: reference from the database but leave the actual file in the backup (since the D: drive still needs the file) or does it purge the file from the backup and leave the D: reference in place in the database but pointing to a now-non-existent file since it’s been purged? Or something else?

Put a different way, if I have the exact same file on 2 different drives and purge the file from only one of those drives, what does Duplicati end up doing?

To extend this question a little bit: If I both delete the file C:\File1.txt and then purge C:\File1.txt from Duplicati - so it now exists neither on C: nor in Duplicati - does Duplicati, on the next backup run, “re-copy” D:\File1.txt to the backup as if it’s a new file since it’s no longer seeing it?

Thanks.

Pectojin · October 6, 2018, 2:40pm

When you purge a file you actually just delete the reference saying “this file, on this exact path, consists of these blocks”. It doesn’t actually delete the blocks, so no data is removed from the backend by the purge itself.

What does remove the blocks is the cleanup task that removes blocks that are no longer referenced by any files.

Since you keep the other file in the backup no blocks are deleted in this case.

mobamoba · October 6, 2018, 3:08pm

When I run the purge command from the command line with a verbose output, it lists exactly how many bytes it’s removing from each zip. Are you saying that data is incorrect and it’s not actually removing any bytes even though it says it is?

Pectojin · October 6, 2018, 5:09pm

There’s some good documentation on the purge method here: Using Duplicati from the Command Line - Duplicati 2 User’s Manual

As it states it’s removing the entry from the dlist and letting compact deal with the cleanup process.

My guess I that the previewed sizes are correct but does not consider if anyone else is relying on the blocks so they may be wrong if you have other files using some of all of the same blocks.

mobamoba · October 6, 2018, 8:10pm

I’m not sure what you mean by the “previewed sizes”. When I run the purge command, it goes through all the zips that contain pieces of a particular file and will, eventually, puts something in the console telling me that - for example - 1.68gigs have been removed from file XXXX.zip along with a list of files in that particular zip that it’s purging. I’m assuming this is during the compacting/cleanup process and not the process where it simple removes the entries from the database.

So are you saying that purge isn’t actually removing 1.68gigs from XXXX.zip if somewhere in the process it detected that the file blocks were needed elsewhere? Because if that’s the case, why would purge tell me it’s removing a certain number of gigs from a zip when it really isn’t?

Pectojin · October 6, 2018, 10:49pm

Sounds odd considering it can’t exactly remove all the data since it’s a duplicate

mobamoba · October 7, 2018, 10:54am

Right that’s why I’m trying to figure out what’s happening under the hood.