I am considering which cloud storage to use and as noted already, services like Amazon S3 and Backblaze B2 not only charge for the amount of data stored, but also for retrieval (so far as I can see upload to cloud is not charged).
I can make a fair estimate as to how much data I would need to store in the cloud, but how do I go about estimating how much download bandwidth I would use (assuming a perfect scenario where I never need to restore data from the cloud)? Does duplicati perform any downloads during its normal backing-up operations?
I gather B2 has the ability to report checksums of files in the cloud. Does this mean that duplicati will do a check on the integrity of the cloud data without needing to download the data?
Because Duplicati has almost no requirements for the backend (only requirements are get, put, list and delete operations), the only way to test the integrity of backed up volumes is to re-download some random files to check if the contents can be read and verify if everything looks as expected.
You can set the number of files to download after a backup with --backup-test-samples=n. replace n with the number of files you want to download for verification. Use option --no-backend-verification=true to disable downloading test samples completely.
Also researching this topic - looking to a backup of approx 200gb and backblaze does look cheapest in that respect for storage.
From the above it seems that duplicati will check one block (50mb default) each time. So if set to run daily then should be well under the allowance from backblaze (1gb per day)
So just have to worry about storage cost. Did I get that right?
Itâs worth pointing out that you can do a lot more than one backup per day but not hourly (if you have the machine running 24 hours a day). So for a NAS, it would probably be a good idea to reduce the Volume size to 40 or 35 MB in order to be able to do hourly backups.
Itâd be worth noting here that B2 stores the SHA1 hashes for individual files, so if duplicati implements a handler for that at some point soon hopefully, there would be the potential to reduce download (during backups) to almost nothing, and still check a bunch of files when running backup jobs. Or preferably a hybrid approach (with configurability), where duplicati could download-and-check one file, but hash check 5 files, etc.
@dave, Iâm new to this as well but I think you also need to consider your versioning/retention period since deleting âoldâ versions (eventually) requires the download of multiple existing backup files (dblocks) which are then locally decompressed then re-compressed into fewer files (with only the versions being kept) and re-uploaded.
So the total bandwidth requirements could get a fair bit higher with maintenance (unless youâre going for unlimited, in which case you shouldnât see anything being the validation transfers already discussed).
Alternatively, you can set --threshold=100 to never consider partially unused data, but only delete full volumes. This will prevent download of dblock files, except for small files.
The options --small-file-size and --small-file-max-count can be used to further control when Duplicati will download small files and merge them into a large volume.
As an aside, and mainly out of curiosity, why does âsmall file max countâ have a byte/Kbyte/Mbyte/Gbyte/Tbyte selector? I assumed this selector was more of a tally, not necessarily related to filesize. And if it is filesize, I canât think of what that actually means.
@kenkendk, --small-file-size and --small-file-max-count look like awesome options, thanks!
Just out of curiosity (as this doesnât apply to my needs) is there any aggregated timeframe bandwidth usage reporting available for those working with bandwidth limited connections (such as cellular hotspots) or destination costs (such as above mentioned B2 usage past 1G)?
Thanks, that makes a lot more sense (and was what I was expecting). Since Iâm new at your system (and pretty new at gitHub), when should I expect this and other incremental fixes to be seen in a release version? Would I need to jump over to the âcanaryâ update track to see it anytime soon?