Hi everyone,
I am running Duplicati on a Windows host and I also want to backup data from 2 Linux machines. To keep everything centralized and to run only one instance of Duplicati, I have written a pre-script for Duplicati on the Windows machine that logs in via SSH to those two Linux machines, creates a gzip tarball of all the data to backup there and copies it over to the Windows machine. Then, those two copied tarballs are part of the regular Duplicati backup.
While I am quite happy with that approach, I realized that it might possibly mess up the chance of Duplicati to efficiently reuse blocks in the source data for consecutive backup snapshots, which might result in a larger storage size in the backend.
I am afraid that even small changes in the source data on the Linux machines might lead to binary wise very different tarballs created on the Linux machines. And when the tarballs are different, Duplicati might have problems to find common binary blocks and the opportunities to reuse data will be missed.
So, the question is: is this concern justified?
And if yes, what would be a better approach? I thought about extracting the tarballs on the Windows machine so Duplicati is actually seeing the raw source files, so it will be easier to find reusable blocks in them that did not change compared to the last snapshot.
But the problem is that I would lose Linux file permissions information by extracting the tarball to the Windows NTFS file system, which is not acceptable.
Another idea would be to ditch the approach to have Duplicati only once but to also install it directly on those two Linux machines, so I would end up running 3 Duplicati instances, one one each machine. It would mean every instance would have direct access to the source files (no tarblls involved).
Does not sound to stupid either, right?
So, what is your opinion on this? Thanks guys!