How is source size calculated?

Hello everyone! I just finished backing up a set of files worth about 900GB with Duplicati. I knew that a lot of huge files in that set were duplicates, so I was expecting a big difference between the “Source” and “Backup” size. Yet, I was surprised to find a huge difference between the “Source” size and the actual size of the files being backed up on disk:

Size on disk: ~900GB
Source size, reported by Duplicati: 357GB
Backup size: 315GB

How does Duplicati actually calculate these sizes? Given the disparity, I thought about two different hypotheses:

  1. Something went wrong with the backup (but as of now, I have now indication that it did).
  2. Duplicati already excludes duplicates from the “Source” size calculation, and the difference between “Source” size and “Backup” size is mainly due to compression.

Perhaps I was a little hasty when posting, after testing with a mock file set and backup configuration, to figure out the specific way Duplicati calculates size, I came to the conclusion that Duplicati does not exclude duplicates from the “Source” size calculations.

Further inspection of the logs revealed the problem, amongst thousands of warnings that I routinely receive due to the OneDrive “on demand files” feature, were two warnings reporting access failure to two folders. That was the source of the discrepancy.

I’ll leave it to the mods to decide if leaving this thread up is useful or if it should be binned…