So, I’ve got a production system I’m testing Duplicati on.
Essentially the client has a very large mysql database. The application itself handles backup, properly dumping the mysql databases for us/Duplicati to handle. [We don’t have to manage doing backups of in-use DB files, etc.]
So, we’re just backing up the “backups” the application makes.
The files we get are both tar.gz and tar [only] files.
The “problem” is deduplication.
The size of the “backup” before we hand it to Duplicati is ~100G. A daily delta is about 50G - so dedup is doing something, but not nearly as much as I’d expect.
So, some questions about dedup.
Is tar going to impact deduplication substantially?
I assume tar.gz may well do so; since, as the data changes, the compression will produce [I think] pretty substantially different output files, and that will make dedup a lot more difficult. [And Duplicati, IIRC, uses a fairly simple dedup process that likely won’t “catch” this.]
So, suggestions as to maximize the dedup process would be helpful.