I am considering which cloud storage to use and as noted already, services like Amazon S3 and Backblaze B2 not only charge for the amount of data stored, but also for retrieval (so far as I can see upload to cloud is not charged).
I can make a fair estimate as to how much data I would need to store in the cloud, but how do I go about estimating how much download bandwidth I would use (assuming a perfect scenario where I never need to restore data from the cloud)? Does duplicati perform any downloads during its normal backing-up operations?
I gather B2 has the ability to report checksums of files in the cloud. Does this mean that duplicati will do a check on the integrity of the cloud data without needing to download the data?
Because Duplicati has almost no requirements for the backend (only requirements are get, put, list and delete operations), the only way to test the integrity of backed up volumes is to re-download some random files to check if the contents can be read and verify if everything looks as expected.
You can set the number of files to download after a backup with --backup-test-samples=n. replace n with the number of files you want to download for verification. Use option --no-backend-verification=true to disable downloading test samples completely.
It’s worth pointing out that you can do a lot more than one backup per day but not hourly (if you have the machine running 24 hours a day). So for a NAS, it would probably be a good idea to reduce the Volume size to 40 or 35 MB in order to be able to do hourly backups.
It’d be worth noting here that B2 stores the SHA1 hashes for individual files, so if duplicati implements a handler for that at some point soon hopefully, there would be the potential to reduce download (during backups) to almost nothing, and still check a bunch of files when running backup jobs. Or preferably a hybrid approach (with configurability), where duplicati could download-and-check one file, but hash check 5 files, etc.
@dave, I’m new to this as well but I think you also need to consider your versioning/retention period since deleting “old” versions (eventually) requires the download of multiple existing backup files (dblocks) which are then locally decompressed then re-compressed into fewer files (with only the versions being kept) and re-uploaded.
So the total bandwidth requirements could get a fair bit higher with maintenance (unless you’re going for unlimited, in which case you shouldn’t see anything being the validation transfers already discussed).
As an aside, and mainly out of curiosity, why does “small file max count” have a byte/Kbyte/Mbyte/Gbyte/Tbyte selector? I assumed this selector was more of a tally, not necessarily related to filesize. And if it is filesize, I can’t think of what that actually means.
@kenkendk, --small-file-size and --small-file-max-count look like awesome options, thanks!
Just out of curiosity (as this doesn’t apply to my needs) is there any aggregated timeframe bandwidth usage reporting available for those working with bandwidth limited connections (such as cellular hotspots) or destination costs (such as above mentioned B2 usage past 1G)?
Thanks, that makes a lot more sense (and was what I was expecting). Since I’m new at your system (and pretty new at gitHub), when should I expect this and other incremental fixes to be seen in a release version? Would I need to jump over to the ‘canary’ update track to see it anytime soon?