Does Duplicati Not Work Well with Too Many Files/Files with Strange Names?

darkarn · December 23, 2018, 5:45pm

Based on what I am seeing with my backups recently, seems to me that Duplicati does not like datasets consisting of over 1 million files and/or files containing strange names such as extremely long strings of randomised numbers and letters that just happen to look like how Duplicati names the files it creates in the remote backend.

I did exactly have such datasets and as a result either Duplicati behaved slowly or it simply crashed out badly; we are talking about error messages that exceed the character limit of any post here! I decided to zip up some of the folders and Duplicati worked smoothly (except that I lost some of its deduplication benefits, which is compensated by the upfront space savings from zipping the folders)

Are you all having similar issues with such datasets, or rather, are you all having issues despite your datasets not being like as described?

ts678 · December 24, 2018, 2:17am

Are you using the recent –usn-policy option? That got some fixes recently but they’re on total path length.

Issue 3311 PathTooLongException when using USNJournal #3456

https://forum.duplicati.com/t/release-2-0-4-3-canary-2018-11-13/5279

PathTooLongException when using USNJournal #3311

Regarding file limits, there aren’t many reports (maybe your invitation will get some) but I did find this one:

Capabilities and limitations? near 1 million files

If you can select shorter pieces out of your error messages, perhaps posting them will get some input too.

darkarn · December 24, 2018, 3:11am

I am running Duplicati on a Debian machine so I can’t use the --usn-policy option

As for the error message, I can post on Pastebin if it is ok since I am unsure which part to select

ts678 · December 24, 2018, 12:15pm

I’m not sure what the comment about “ok” means. I don’t know their rules but I’ve seen people post.
Public posting is often good enough for anybody interested enough to go look to see what’s wrong.
Redacting private information is good. Most is obvious, but I’ve seen people have accidents before.

Generally the most useful part of the message is at the top, with the details of the situation below it.

darkarn · January 2, 2019, 8:50am

Thanks, here’s the Pastebin links for said error messages

https://pastebin.com/3gXhhF56
https://pastebin.com/bb7aGnhm
https://pastebin.com/FSy2tcZS
https://pastebin.com/aZ3SWpkP

I still stand by my position that zipping up potentially problematic folders helps as right now my backup jobs are running smoothly

ts678 · January 2, 2019, 9:20pm

Was there a filename in there that’s relevant? I couldn’t spot one, but two of those were indeed extremely large.

Maybe you can just describe “containing strange names such as extremely long strings of randomised numbers and letters” further. There’s an OS limit (what OS is this?) on both the length between slashes, and total length. Maybe if a reproducible test can be found, it can be looked at. Also, is failure with or without GUI commandline?

darkarn · January 17, 2019, 9:30pm

Duplicati is running on Debian

Examples of filenames/directory names:
8a73e2f8c59c836faa7823f4e3d2d14636e29471\0c\0c0e011b918fbb66238be38052496ccb5d85db09

Issue persist with or without GUI commandline

ts678 · January 17, 2019, 10:00pm

What’s with the backslashes? That whole string is a single-level Linux filename without any directory information.

$ touch '8a73e2f8c59c836faa7823f4e3d2d14636e29471.7z\8a73e2f8c59c836faa7823f4e3d2d14636e29471\0c\0c0e011b918fbb66238be38052496ccb5d85db09'
$ ls -ln *9
-rw-r--r-- 1 1000 1000 0 Jan 17 16:57 8a73e2f8c59c836faa7823f4e3d2d14636e29471.7z\8a73e2f8c59c836faa7823f4e3d2d14636e29471\0c\0c0e011b918fbb66238be38052496ccb5d85db09
$

darkarn · January 18, 2019, 5:47am

Sorry, I forgot to omit something from that filename. I think that’s the problem; that line is the full directory of the file. Each backslash is a folder. So to break it down:

Folder: 8a73e2f8c59c836faa7823f4e3d2d14636e29471
Subfolder: 0c
Filename: 0c0e011b918fbb66238be38052496ccb5d85db09

ts678 · January 23, 2019, 8:55pm

Working fine here on 2.0.4.5. It seems like something’s going on that’s more than just the strange names.

Listing contents 0 (1/18/2019 6:54:05 AM):
/home/xxx/backup_source/ 
/home/xxx/backup_source/8a73e2f8c59c836faa7823f4e3d2d14636e29471/ 
/home/xxx/backup_source/8a73e2f8c59c836faa7823f4e3d2d14636e29471/0c/ 
/home/xxx/backup_source/8a73e2f8c59c836faa7823f4e3d2d14636e29471/0c/0c0e011b918fbb66238be38052496ccb5d85db09 (8 bytes)
Return code: 0

darkarn · January 27, 2019, 4:13pm

Thanks for testing! This is quite the conudrum, I am quite unsure what else could be the trigger other than possibly lack of temporary storage but I can’t recall if I have set a HDD/SSD as the temp storage instead of using /tmp/

ts678 · January 27, 2019, 11:49pm

In addition to disk space, are you monitoring for memory exhaustion? For example you could run top.

You could possibly see if fewer files with strange names work better. My testing used only a single file.

Duplicati keeps every file path of every backup version in its database. This can slow things over time. Removing older versions using retention policy is one way to keep the database from growing forever.

Depending on how old your Debian is, your mono might be quite old. Sometimes old ones have bugs. https://www.mono-project.com/download/stable/#download-lin-debian can be tried, in hope it will help.

Sorry there’s nothing definite I can say from looking at information so far, but I hope you find a way out.

darkarn · January 28, 2019, 6:01am

No problem! For time being, I am going through all these files to see what I don’t need anyway, which will hopefully shrink the database to something more manageable. I was backing them up to Gsuite via Duplicati first in case I accidentally deleted something that I needed

In the meantime, zipping up these files is a good strategy seeing how it helped me workaround this issue and also reduced disk usage on both server and client side