Using Backblaze as a backend, I can choose to backup every backup job to a different bucket or simply use the same bucket for all backups. Any reasons for doing one or the other?
But different directories within the bucket?
As kahomono said, as long as youāre using different directories within a bucket thereās no functional difference on the Duplicati side. If youāre using a unique-per-backup prefix and putting them all in one folder you might run into performance or ādrive formatā issues related to large file counts in a single folder (in other words, multiple folders is better than relying on a prefix).
I donāt know much about B2 but perhaps they have some bucket maintenance (such as moving infrequently used content to cold storage) that might make a difference if using individual buckets.
I have 5 computers backing up into my B2 account and have a separate bucket for each. I agree that it would be functionally identical to have one bucket and 5 folders, though from the ābucket overviewā screen it might be a tick easier to monitor sizes between all of them with separate buckets.
Yes, letās assume that for any further discussion.
So that would be a concrete argument pro multiple buckets.
Anything else? What if I transfer some large files from one backup job to another (i.e. from a source folder handled by one job to a source folder handled by a another job)? Might it be an advantage if both jobs are backing up to the same container? I believe no, since there is no deduplication across jobs/machines, right?
Although that is just wild speculation, it would ultimately turn into an argument for multiple buckets, right?
So does anyone have any arguments for a single bucket?
Laziness?
Actually, I was just being lazy myself in not looking up this Azure Blob Archive Access Tier topic that mentioned a cold tier that required specific blob requests to get it moved back to an accessible hot tier.
So Iād settle for āmildā speculation.
I saw this in another topic:
Although you were referring to minio/S3 buckets, I guess the same logic applies with B2. Amy specific reasons for using separate buckets?
Mostly human readability. It seemed ācleanā to me to use separate buckets per job. I named my buckets COMPUTER - backup - JOB NAME just for myself. It does make it easy to see if size matches up and I did some data moving and corrupted things. Having separate buckets helped keep corruption / rebuild to one backup set in self hosted environment.
The one thing I know about s3 (is b2 s3 compatible?) is that āfoldersā in s3 buckets are more like metadata. If you look on the backside all the stored file are stored in the same folder with some internal to s3 magic to make it look like it is a folder. I can confirm that at least on minio s3 server each bucket is a true fs folder on the storage disk.
So if there are concerns of having to many files in a single folder for an os then buckets has the advantage over folders. Iām not sure how much this has value on hosted solution vs a self hosted one.
Also, this is just what I read in the specs if anyone has experience that contradicts it ā Iād listen to them.
Functionally no, there are no reasons to use different buckets.
I find it more logical to store related backups in separate buckets. When I later decide I donāt need the backup, I can just nuke the bucket. Removing the folder could also be an option, but this way I know there is no collateral.
Please donāt store multiple backups in the same folder. For the separation and potential confusion as to what is what, I recommend only storing multiple backups in a single folder as a last resort.
Not sure what your point is here. Isnāt that what we said?
Iām not super familiar with B2 (or S3) - can multiple logins be created for a single account but each assigned to a single bucket?
If so then this could allow for reduction of damage. For example if one of your source credentials gets loose in the wild only that sourceās bucket would be at risk - and only that source would need to have new credentials created (rather than having to re-auth EVERY source going into your account).
Did that make sense or am I widly speculating again?
Not logins, but each bucket has itās own key.
So separate buckets isnāt any better than folders in terms of āuh, oh - somebody has my account loginā however itās perfectly valid as a way to separate individual sources from potentially seeing each otherās backup files.
Of course it shouldnāt matter that they see files for other sources backups since Duplicati (at least with default settings) doesnāt put anything identifiable or un-encrypted in the destination. Pretty much the only thing source A could determine about source B was how big their destination data set was and maybe how often changed files are backed up.
Is āyou should use buckets instead of folders 'cause I say soā a valid answer?
I donāt know of any way to do separate authentication for each bucket. In each of the 5 computers I have backing up to separate buckets, i used the same account key and auth key - the only difference for each is that each gets a unique bucket name.
FWIW, I have just asked Backblaze if they can provide some way for logins to be append only (so no changing/deleting of files possible) to guard against trojans stealing login data and messing with data. That (in conjunction with long encryption keys) would probably solve this problem too.
Well, keep in mind that changing/deleting is necessary both for compacting as well as removing of unneeded old versions, so iām not sure such a mode would be suitable for the average userās use caseā¦ but i suppose it would be nice to have as an option for āparanoid modeā.
ItĀ“s the price I will pay for running client based backups to ādumbā storage (besides I cannot really say the idea of having an eternal log does not excite me a little ).
The only other option would be my own servers and seeing that I managed servers for almost a decade (in what was still a relatively benign environment compared to today), do not really see doing it again just for personal backups Even ignoring the work involved, there is just no way I can provide it with reasonable redundancy for anything close to what cloud providers charge for a few TB.
Yes. Re-reading it I am not sure how I read it the first time