Recommended settings for Google Team Drive

I use Team Drive in my Google account as backup destination. What would be the best settings for this use case? Block size, etc?

I backup my personal stuff, like photos, docs, etc, 1-3 TB size all together. Ideally, I will never need to restore.

I tend to go with larger files sizes, like 100MB or maybe even 500MB? Team Drive doesn’t have size limit, but 400 000 file number limit - so would be good to keep number of files low, correct?

Any other settings I should look at?

Thanks!

A rule-of-thumb which we hope to avoid by below is to limit backups to a few million blocks.
For this, that means 1 MB --blocksize, up from the default 100 KB. More on that value here.

Choosing sizes in Duplicati offers some other guidance. If your link is quick, you can afford to increase what the Options screen calls “Remote volume size” which is actually dblock-size under another name.

The slowness that can arise is many volumes may be needed to restore few files, if blocks are needed, however if this is purely for disaster recovery restore (but why not get some other use of it meanwhile?) you’re going to download everything anyway, so again larger dblock volumes may have few downsides.

Probably nobody knows. There has been no systematic benchmarking of all backups on all destinations. There’s definitely a job opening if you’re interested in exploring some on your own question about “best”.

Potentially a large number of files would slow down file listing, but that’s generally only at a backup start and end (e.g. to sanity-check that the destination files are “looking” as expected). Even at a rather small default 50 MB remote volume size, that gives you 10 TB (half the files are dindex files), so would suffice.

These would probably be fine too. This number (unlike blocksize) can be changed later, but affects only newly created volumes, however increasing it can trigger a compact which may work awhile to give you what you asked for. Lowering the number is less effective, so you may be holding large files a long time.

If you have time and interest, exploration of settings changes on various sorts of performance is an area much in need of exploration and documentation or even following forum topics to assist people with that.

As a community work, Duplicati hopes that everybody can contribute something to its collected progress.

Omg there is a file count limit on gdrive?

FIrst of all, make sure not to go with the god-mode token, it is a real hurdle to migrate (basically you have to reupload), once Google takes away that everything-goes token possibility:

You will have google drive 403 errors, if you use the latest beta like I do.
So I recommend --number-of-retries=10 --retry-delay=20

You wrote you never want to restore, still I share my experience and decision. It is not just the restore time to be taken into consideration, but also the db to work with.

I have ~5.3 TB backed up to google drive.
I was testing more before doing my full backup because I have read horror-stories about restoration. Yes, 1-2 days is totally ok for me, but I saw 10 day horrors. I have read about the horrible db-performance hit after a certain number of items, also the more blocks, the bigger is the db.
The db size itself (although proportional to the amount of backed up data) isn’t a great contributor of problem, but the restoration is.

BTW if it comes to that, I don’t recommend restoring directly from the remote gdrive backend, the restoration process is a 2-pass thing, first the files then the metadata. The metadata is supposed to be veeeery small changes, yet it is consulting all the remote blocks, so in the end it will download every block twice.
=> strategically I will download backend to be local before restoration, so I only care about backup performance and internal blocksize

The other trap is to not backing up the db, or not accounting for its size. Although not catastrophic, if you only have your blocks, you can recreate the db, but it will be significant time. Also, the bigger the db, the more "jumps’. Sorry for the word, am no db expert.
=> blocksize strategy: striving for smaller db means bigger blocksize
=> duplicati runtime strategy: have the db on non-spinning disks, I invested in NVMe for SSD cache, as my environment is a NAS

So I wanted to test how problematic is it, if I have block the size of 5 MB. I wanted to see if it penalizes the restoration. You can see in the table I attach, that basically having bigger blocksize helps.
Notable differences:

  • the db size having 10kB blocksize was 852 MB, with 5MB it dropped to 54 MB.
  • introducing SSD cache reduced the db repair (fixed blocksize) to 66%.
  • changing the blocksize from 10kb to 5MB reduced the db repair to 50%
  • blocksize does not effect the “Target file is patched with some local data” part of the restore or “restore integrity” part, but SSD cache does
  • metadata recording (second phase of restore) is equally bad performance no matter the ssd cache or blocksize. It is still better when restoring from local backend than restoring from gdrive although I did’nt save the gdrive restore log back then.

Also after deciding for the final parameters I had a 8-10 days long initial backup of 5 TB. The backup is 3.38 TB (my wife duplicates like mad), the DB is 469 MB big.

Read more:

Amazing response, and thanks for benchmarks. This was timely because I was posting to @tarianjed

Please change default blocksize to at least 1MB #4629

yesterday wondering what things the new indexes on default blocksize fix and what they can’t.(DB size).
If we can figure out a way to get these fixes in a release, then maybe more help is available to measure.