Millions of files on NFS

Yariv_Hazan · November 15, 2022, 2:33pm

Hello,
Using Duplicati to backup millions of small files on NFS, which takes days to complete.
Duplicati Linux server CPU, memory and network usage are very low so they do not seem as the bottleneck.
The files do not change much, but scanning each times take a long time. Any fine-tuning I can do on the Duplicati level to increase performance?
Thanks,
Yariv

drwtsn32 · November 16, 2022, 12:25am

If this were a Windows machine, I’d suggest using the --usn-policy option.

Unfortunately I don’t think an equivalent exists for Linux let alone remote file sources like NFS. Not sure there is any option but for Duplicati to walk the filesystem at the start of each backup.

Maybe someone else has some ideas…

Yariv_Hazan · November 16, 2022, 10:22am

Unfortunately its Linux guess not much can be done…

gpatel-fr · November 16, 2022, 11:00am

Hello

while Duplicati can do remote backup, it’s more thought out as a tool for backing up the local computer, so the workaround could be to setup Duplicati on the NFS server.
Nonetheless, there seems to be problems with .Net and NFS:

github.com/dotnet/runtime

Directory.GetFileSystemEntries is very slow on linux with NFS share

opened 12:18PM - 01 Oct 21 UTC

nmoreaud

area-System.IO tenet-performance

`Directory.GetFiles`, `Directory.EnumerateFiles`, `DirectoryInfo.GetFiles`and `D…irectoryInfo.EnumerateFiles` are pretty slow at listing files of my NFS shared directory. I guess that these functions internally retrieve files metadata (access time, etc) which is not mandatory and slows down considerably the process. Is there any workaround? For comparison, with a directory containing 50.000 files: ``` time (/bin/ls -1 /tmp/project/share/path/to/.directory >/dev/null) 0.05s ------------------------------------------------------------------ time (/bin/ls -la /tmp/project/share/path/to/.directory >/dev/null) 9.36s ------------------------------------------------------------------ import glob print(glob.glob("/tmp/project/share/path/to/.directory/*")) time (python3 script.py >/dev/null) 0.11s ------------------------------------------------------------------ import java.Io; public class Main { public static void main(String[] args) { for (File file : new File("/tmp/project/share/path/to/.directory").listFiles()) { System.out.println(file.getName()); } } } javac Main.java time (java Main >/dev/null) 0.23s ------------------------------------------------------------------ using System; using System.IO; foreach(string file in Directory.EnumerateFiles("/tmp/project/share/path/to/.directory")) { Console.WriteLine(file); } time (dotnet run --no-build >/dev/null) 28s ``` Environment: dotnet 5.0.207, docker 20.10, debian container, volume mounted from the host NFS share NFS options: nolock,lookupcache=none,nodiratime,noatime,noacl,nordirplus

Yariv_Hazan · November 17, 2022, 10:42am

Thank you, I do not have access to install on the server side.
Maybe there are NFS mount tweaks that are more suitable for Duplicati?

gpatel-fr · November 17, 2022, 11:15am

maybe but there is no info about NFS tweaks in the Duplicati doc, so I’d advise you to read the Github issue I linked in the previous post and try to get some good from that.

ts678 · November 17, 2022, 3:04pm

Whatever you can find that helps look through lots of files and efficiently check time attribute on each.
nfs man page has hints. Probably especially make sure that you’re taking advantage of NFS caching.
nfsstat and nfsiostat and Optimizing NFS Performance and other Internet info might be helpful to you.

My question is not usage, but latency, e.g. does backup hit NFS server millions of times with queries?
Actually getting statistics might answer that (along with Internet help), then the challenge is avoiding it.
Duplicati would almost certainly run faster and more reliably on local files, but that’s not your situation.