Duplicati creates to many backup files

Hi,

I used Duplicati now for more than one year and never had a problem like the following: I installed Duplicati 2.0.3.3 beta, now updated to version 2.0.4.5 beta. I hoped that my issue would be resolved in the new version.

I created a backup with these arguments:

–backup-name=“Backup”
–encryption-module=aes
–compression-module=zip
–dblock-size=500MB
–keep-time=3M
–passphrase=“xxxxxxxxxxxx”
–all-versions=true
–blocksize=500MB
–number-of-retries=5
–zip-compression-level=9
–zip-compression-method=Deflate
–exclude-files-attributes=“temporary,system”

The files to backup have a size on my HDD of 14,6 GB. Now, after 500 MB of the data have been backuped, Duplicati already created more than 1700 files in the backup location, there might come more. How can I reduze the amount of files? When the backup contains of more than 5000 files in one directory, I cannot access the directory anymore.

Thank you for your help in advanced.
Regards,
Bjoern

Hello @Bjoern and welcome to the forum!

Choosing sizes in Duplicati explains the two different size parameters, and how one might want to select values. Default values are –blocksize=100kb and–dblock-size=50mb which means it’s guaranteed that multiple full-size hash blocks can fit into a dblock file. In testing, I found one 137 byte block (likely metadata – time stamps, etc.) occupying its own small dblock file. I suspect it considered the file filled, because another full-size block won’t fit.

Please try stopping the backup, cleaning up the database and destination, and setting some smaller --blocksize.

1 Like

@ts678 is correct. If your block size (size of chunks does are split into) is the same as your dblock size (“Volume upload”, size of your updated compressed files) you’ll effectively say any file SMALLER than block size takes up an entire dblock file to itself.

So if you have 1,000 files smaller than 500MB, you’ll end up with 1,000 dblock files uploaded.

Unfortunately, you can’t change the block size of an existing backup so you’ll have to delete this one and start over.

Yeah, I would suggest just using the default values.

I think the original intent was to increase file sizes, to keep file count below 5000. At 5000 files (perhaps enough for 100 GB of source), something bad happens, described as “I cannot access the directory anymore”. I wonder if this is a Microsoft destination that’s showing the infamous 5000 item list view limit as discussed in ‘No files were found at the remote location’ using Onedrive for business and possibly fixed in the Graph backends (a.k.a. “v2”) that should be moved to anyway?

@Bjoern what Destination Storage Type is breaking at 5000 files? Can you change that access method to v2? You’ll probably need to set up a new AuthID, but maybe the 5000 limit will go away (based on testing link above).

Ah ok, well in that case he should change only the --dblock-size option… perhaps the 500mb setting is fine. But leave --block-size at the default.

1 Like

Thank you. I think, this is all I need to change. Now only about 600 files have been created in the backup location.

Once I thought, if I chose a higher value for the blocksize, more data could be combined in a file chunk.

Reasonable thought, maybe tripped up by implementation details. A fully filled block could certainly be large.

The key word there might be could. File chunks are sometimes small, and if a small block gets in the dblock, my theory is your dblock felt full either then or when a block of full dblock size arrived. A file also has a small block of metadata which on Windows is mainly file times. From the ending of How the backup process works:

Many details were omitted from the above example run, some of those details can be summarized as:

* Metadata is treated like a normal block of data
* When a `dblock` size is too big, a new one is created 

I tried testing my theory before posting it by using --dblock-size=500MB with --blocksize=100MB, hoping that dblock files would be written partly filled, but not as almost-empty as yours, and that seemed to be the case:

11/28/2018  02:47 PM         1,617,661 duplicati-20181128T191256Z.dlist.zip.aes
11/28/2018  02:14 PM       433,512,925 duplicati-b0dea11c17768423f8a7f0318f0c039c7.dblock.zip.aes
11/28/2018  02:17 PM       514,909,101 duplicati-b18abd26beee649edb84c3fffd852399b.dblock.zip.aes
11/28/2018  02:24 PM       462,481,805 duplicati-b3a67e0f277e84ee7819757761a07cdbb.dblock.zip.aes
11/28/2018  02:14 PM       424,773,229 duplicati-b481e0318646c401399e94894ff2905e3.dblock.zip.aes
11/28/2018  02:46 PM       424,953,949 duplicati-b85a4e13191424dd5854f959702596779.dblock.zip.aes
11/28/2018  02:47 PM        73,875,021 duplicati-be17f55c4122b437d88787873593279bd.dblock.zip.aes
11/28/2018  02:14 PM             5,725 duplicati-i24839d66428448e7be5b5e0913fac486.dindex.zip.aes
11/28/2018  02:24 PM           135,389 duplicati-i8519202d11fb471cbe80db97cbbc050d.dindex.zip.aes
11/28/2018  02:17 PM            10,349 duplicati-i9b469356746643de9313a7a2dd9eb971.dindex.zip.aes
11/28/2018  02:14 PM             5,677 duplicati-iba7f0e5aefac431089e7835eab131fc8.dindex.zip.aes
11/28/2018  02:46 PM           899,757 duplicati-if988d21898cf4131ac61af020148874c.dindex.zip.aes
11/28/2018  02:47 PM            45,437 duplicati-ifecfa9717b684b9b8b0cb3e8b1be55f0.dindex.zip.aes

I’m glad you’re down to a more reasonable number of files, and with --blocksize=100KB (unlike my intentional 100MB just to see if I could see partial filling), I suspect your stream of dblocks is more fully filled than above.

Just wanted to share that I personally like --blocksize=1MB, seems to make repairs faster

That’s likely because more blocks means more database records which generally means slower database access.

We’re hoping to optimize the block related SQL at some point but until then, I suppose 1MB blocks would relate to about a 10% decrease in database records.

No doubt. But it will also make deduplication less effective.

Because this seems to be getting some discussion, see the similar one in Duplicacy vs Duplicati in Duplicacy’s issues, which pointed to Duplicati’s Duplicati 2 vs. Duplicacy 2 article before going to performance and dedup.

FWIW: some time ago I created a storage calculator in Google Docs that gives a rough estimation of how much storage, backend files etc. are created with various (d)block sizes.

1 Like