Like a lot of others, I’m looking for a replacement for CrashPlan. For my Linux servers, using Duplicati with BackBlaze B2 as remote storage is my leading candidate right now. One thing that I haven’t figured out from the docs yet is if I can point all of my servers at the same B2 bucket and remote folder path, or if I should use a different remote folder path for each server. Naturally, I’d like to maximize the amount of deduplication to minimize storage costs so I was hoping that by pointing all of my servers at the same remote storage I could deduplicate between servers.
I came here specifically to ask this too. My main PC, my wife’s laptop and my beater laptop all share some media files that have quite a bit of overlap. I can’t tell whether it would do me any good to point a bunch of sources at the same B2 “bucket” or whether it would be ultimately destructive.
In Duplicati, deduplication does not work across multiple backup jobs, so uploading backups to the same folder will not help to save storage.
Generally, it is not recommended to upload multiple backups to the same location. Backup job 1 will detect unrecognized filed created by backup job 2 and vice versa.
If for whatever reason you still want to put multiple backup jobs in the same folder, give the uploaded volumes of all backup jobs a unique prefix by supplying the
--prefix option to every backup job.
So, if Duplicati doesn’t dedup across backup jobs, would it be reasonable to run one backup job on one host, which points at source directories on different computers? So, for example, if I had some storage target (NAS or cloud) but ran a backup job on computer 1, which had source directories of “C:\data*” and “\computer2\c$\data” and “\computer3\c$\data” would that be an acceptable workaround to hopefully take advantage of deduplicating redundant data across all hosts? Obviously, the caveat is that the machine running the backup would have to be responsible for all backups, but that could even be considered a positive, since it would mean NOT running duplicati on more than one machine, and have one central config to manage?
I run a similar setup, 6 laptop/desktop plus a small server (USB disk on an atom based box). All shared files are synced between machines and the server using syncthing https://syncthing.net/
I do only backup from the server. I’m in the process of switching to duplicati after being dumped by crashplan. Over time I switched from backuping all machines to using backups only on the server. it freed a lot of hassles and of resources on every machine. I only do a small system param (ie /etc) backup from the machines onto the server itself, it then is backed up to the cloud.
Since all regular user data files belong to the same backup job (even if synced with a single machine or user), it benefits from deduplicating also.
After serveral years, I found restoring and looking for file versions to be consistently easier and more reliable from the server than from the machines (as long as syncing works well)
Syncthing also provides some versioning which are kept on the server.
I trust duplicati will be as good if not even better than crashplan, and open source, in the long run.
I hope this helps.
That could be a good solution if you have many identical files on your computers. In that case, deduplication will save a significant amount of storage space. Backdraws could be that if - for whatever weird reason - your backup gets lost, you loose all backups from all computers. Separate backup jobs will spread the risk.
I guess shadow copies will not work on shared folders on remote hosts. So if you want to make use of it, you only can do it on the host where Duplicati is installed on.
Just a hint regarding formatting on this forum: it is common practice to use so called pre-formatted text for file names, commands, etc. You can achieve this by using the
</> button in the editor or simply by putting backticks (
`) before and after the text. So, for example, this
source directories of `C:\data*` and `\computer2\c$\data` and `\computer3\c$\data`
Will be rendered like this:
source directories of
The whitepaper here indicates shared backup dedupe is being considered. From pg 6:
“Shared deduplication. By using index files, it becomes possible for multiple backups to utilize
the same block volumes. This can enable deduplication across multiple backups, potentially
from distributed locations.”
Is this something that is planned? Given the high cost of cloud storage this could be very beneficial.
I do not have plans to develop this, but yes, it can be done.
Apologies for reviving an old thread however is there any chance that your intentions on cross-backup dedupe have changed? While this generally be a valuable feature, am i also right in thinking it would help solve the issue of cross-platform backup seeding which is basically the crux of this (i think)…
If deduplication across sources is an important feature for you, duplicacy is the way to go. Not sure if any other backup software can do it.
By offering a bounty for the feature.
Just to clarify, I believe @kenkendk meant that conceptually it could be coded, not that it can be done with the current code base.
And for those that might be interested - other than developing it yourself, the best way to help make it happen is find (or create) a post on GitHub (probably call it something like “cross source deduplication”) with a bounty to encourage developers to work on it.
Yes, that’s how I understood it, but was wondering just how much changes it would require, how it would be done conceptually in duplicati, assuming that @kenkendk would want to keep the local database and possibly also the combining of multiple chunks into zipped volumes. Duplicacy uses neither of these in order to be able to achieve cross-source deduplication…