One or more buckets in Backblaze B2?

tophee · September 25, 2017, 5:53am

Using Backblaze as a backend, I can choose to backup every backup job to a different bucket or simply use the same bucket for all backups. Any reasons for doing one or the other?

Kahomono · September 25, 2017, 10:23am

But different directories within the bucket?

JonMikelV · September 25, 2017, 3:02pm

As kahomono said, as long as you’re using different directories within a bucket there’s no functional difference on the Duplicati side. If you’re using a unique-per-backup prefix and putting them all in one folder you might run into performance or “drive format” issues related to large file counts in a single folder (in other words, multiple folders is better than relying on a prefix).

I don’t know much about B2 but perhaps they have some bucket maintenance (such as moving infrequently used content to cold storage) that might make a difference if using individual buckets.

drakar2007 · September 25, 2017, 3:39pm

I have 5 computers backing up into my B2 account and have a separate bucket for each. I agree that it would be functionally identical to have one bucket and 5 folders, though from the “bucket overview” screen it might be a tick easier to monitor sizes between all of them with separate buckets.

tophee · September 25, 2017, 11:28pm

Yes, let’s assume that for any further discussion.

So that would be a concrete argument pro multiple buckets.

Anything else? What if I transfer some large files from one backup job to another (i.e. from a source folder handled by one job to a source folder handled by a another job)? Might it be an advantage if both jobs are backing up to the same container? I believe no, since there is no deduplication across jobs/machines, right?

Although that is just wild speculation, it would ultimately turn into an argument for multiple buckets, right?

So does anyone have any arguments for a single bucket?

JonMikelV · September 25, 2017, 11:33pm

Laziness?

Actually, I was just being lazy myself in not looking up this Azure Blob Archive Access Tier topic that mentioned a cold tier that required specific blob requests to get it moved back to an accessible hot tier.

So I’d settle for “mild” speculation.

tophee · September 26, 2017, 7:20am

I saw this in another topic:

Although you were referring to minio/S3 buckets, I guess the same logic applies with B2. Amy specific reasons for using separate buckets?

techdabbler · September 26, 2017, 7:33am

Mostly human readability. It seemed ‘clean’ to me to use separate buckets per job. I named my buckets COMPUTER - backup - JOB NAME just for myself. It does make it easy to see if size matches up and I did some data moving and corrupted things. Having separate buckets helped keep corruption / rebuild to one backup set in self hosted environment.

The one thing I know about s3 (is b2 s3 compatible?) is that ‘folders’ in s3 buckets are more like metadata. If you look on the backside all the stored file are stored in the same folder with some internal to s3 magic to make it look like it is a folder. I can confirm that at least on minio s3 server each bucket is a true fs folder on the storage disk.

So if there are concerns of having to many files in a single folder for an os then buckets has the advantage over folders. I’m not sure how much this has value on hosted solution vs a self hosted one.

Also, this is just what I read in the specs if anyone has experience that contradicts it – I’d listen to them.

tophee · September 26, 2017, 7:49am

No:

https://help.backblaze.com/hc/en-us/articles/218513487-Is-the-B2-Cloud-Storage-API-Compatible-with-Amazon-S3

kenkendk · September 26, 2017, 12:20pm

Functionally no, there are no reasons to use different buckets.

I find it more logical to store related backups in separate buckets. When I later decide I don’t need the backup, I can just nuke the bucket. Removing the folder could also be an option, but this way I know there is no collateral.

kenkendk · September 26, 2017, 12:22pm

Please don’t store multiple backups in the same folder. For the separation and potential confusion as to what is what, I recommend only storing multiple backups in a single folder as a last resort.

tophee · September 26, 2017, 3:29pm

Not sure what your point is here. Isn’t that what we said?

JonMikelV · September 26, 2017, 3:35pm

I’m not super familiar with B2 (or S3) - can multiple logins be created for a single account but each assigned to a single bucket?

If so then this could allow for reduction of damage. For example if one of your source credentials gets loose in the wild only that source’s bucket would be at risk - and only that source would need to have new credentials created (rather than having to re-auth EVERY source going into your account).

Did that make sense or am I widly speculating again?

tophee · September 26, 2017, 3:38pm

Not logins, but each bucket has it’s own key.

JonMikelV · September 26, 2017, 3:45pm

So separate buckets isn’t any better than folders in terms of “uh, oh - somebody has my account login” however it’s perfectly valid as a way to separate individual sources from potentially seeing each other’s backup files.

Of course it shouldn’t matter that they see files for other sources backups since Duplicati (at least with default settings) doesn’t put anything identifiable or un-encrypted in the destination. Pretty much the only thing source A could determine about source B was how big their destination data set was and maybe how often changed files are backed up.

Is “you should use buckets instead of folders 'cause I say so” a valid answer?

drakar2007 · September 26, 2017, 4:18pm

I don’t know of any way to do separate authentication for each bucket. In each of the 5 computers I have backing up to separate buckets, i used the same account key and auth key - the only difference for each is that each gets a unique bucket name.

Gabriel_Ambuehl · September 26, 2017, 4:36pm

FWIW, I have just asked Backblaze if they can provide some way for logins to be append only (so no changing/deleting of files possible) to guard against trojans stealing login data and messing with data. That (in conjunction with long encryption keys) would probably solve this problem too.

drakar2007 · September 26, 2017, 4:47pm

Well, keep in mind that changing/deleting is necessary both for compacting as well as removing of unneeded old versions, so i’m not sure such a mode would be suitable for the average user’s use case… but i suppose it would be nice to have as an option for ‘paranoid mode’.

Gabriel_Ambuehl · September 26, 2017, 4:58pm

It´s the price I will pay for running client based backups to “dumb” storage (besides I cannot really say the idea of having an eternal log does not excite me a little ).

The only other option would be my own servers and seeing that I managed servers for almost a decade (in what was still a relatively benign environment compared to today), do not really see doing it again just for personal backups Even ignoring the work involved, there is just no way I can provide it with reasonable redundancy for anything close to what cloud providers charge for a few TB.

kenkendk · September 27, 2017, 8:10pm

Yes. Re-reading it I am not sure how I read it the first time