Slow backup with small files

Hi iall,

I’m experiencing Duplicati (2.0.5.1_beta_2020-01-18 and I try the latest canary v2.0.5.111-2.0.5.111_canary_2020-09-26). now for my backup solution. I have ~200K files to backup. Most of my files are source code file (.c/.cpp.h) etc. I have a a good gear, Dell laptop all nvme drives, 32 gig of RAM with I7 10th gen. The backup take about a bit more than 1 day to backup those files. I’m on Windows 10 last version 20H2.

From my task manager I see that Duplicati is running on 4 (I have 12 core) of my cores. My CPU usage if about 30%. My nvme disk is always about 1-2% used. networks is about 8KB/s. I’m backuping to an sftp server. My internet is 1Gb and the server is on the 1Gb link too.

It’s look like I have CPU bottleneck, but I’m wondering why ? I have all default option set except check-filetime-only = ON.

Is there a way to diagnostic this ? I’m looking for advise to help my getting this backup running faster.

Regards,

Jonathan

Welcome to the forum @dumarjo

It seems like it should vary depending on how much you changed unless somehow walking files is slow.
Show log for your backup should give a lot of statistics. An example of a big-picture summary would be

image

and the Complete log link has lots more, including the amount of data uploaded (or go look at the files).
You can, if you like, set log-file=<path> and log-file-log-level=verbose to see the status of every file seen, however the backup of the files isn’t detailed. Still, if few files are mentioned, few files should be updated.

usn-policy can avoid having to scan files. Canary should do better. Windows changed, and slowed Beta. You’d have to run Duplicati with Administrators privileges actually in effect in order to use USN though…

If somehow your SFTP server is getting behind, you should see asynchronous-upload-limit (default 4) in asynchronous-upload-folder waiting to go up in FIFO order. The default size of such files is about 50 MB.

On the other hand, if Duplicati can’t find enough changed data fast enough, you’d probably get an upload speed burst when it finally fills up a 50 MB, so a constant 8 KB/S (if that’s what you see) seems strange.
You could certainly cut the network and SFTP server out by backing up to a local folder for speed testing.

Log level has other values. Retry is good for seeing the rhythm of uploads. Profiling (huge) shows speed information, and also SQL statement execution times, which are more a factor for big backups. Is yours?

But back to the original comment, if you do back-to-back backups with no source file changes, and it still takes the same long time, then I guess it’s the file walk. Does check-filetime-only actually speed up any?

Hi @ts678

Here a picture of the summary for my backup

I see the not many file changes and get uploaded. So this should not be the transfert to the sftp for shure.

For the complete log, I don’t see much interesting info. infos.

The check-filetime-only don’t look to have increase the speed. I will try the usn-policy to see if it’s help. I will also increase the log level to verbose and check for the SQL (probably a big potential).
do you think this option should be set also ? use-block-cache

Thanx for your help !

Unless the SFTP server is having some weird problem. You need to look at the logs to see transfers.

I guess that’s good, but you also need to look at the log file you make yourself to see what’s it’s doing.
If that’s too much work, you can get a look at About --> Show log --> Live and easily try various levels.

Note it only works on local (NTFS) files. If these are not local, that’s a whole other set of possible slow.

I think the early experiment found it didn’t help very much, otherwise it would probably be set by default.

Hi @ts678

I started new backup this morning with all the changes, and I see no speed increase. I checked the live log at different level and for now all look fine. I only see file checking in the log file. I see no error, warning or anything else. From my looking I see that an average of 4 or 5 files checked per seconds. Should I try something else ? I can test the canary version to see if it’s help on the usn-policy, but i’m not too confident.

If that refers to “Checking file for changes” lines, you should see more than that, but maybe you missed it.
Below are two backups. The 12:23 file was not changed, while the 12:28 file was (and so it announces it).

2020-11-30 12:23:44 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359
2020-11-30 12:23:51 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359
2020-11-30 12:23:51 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-SkipCheckNoMetadataChange]: Skipped checking file, because no metadata was updated C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359
2020-11-30 12:28:35 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359

2020-11-30 12:28:47 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359
2020-11-30 12:28:47 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-CheckFileForChanges]: Checking file for changes C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 11/30/2020 5:28:08 PM vs 11/30/2020 1:41:53 PM
2020-11-30 12:28:48 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileBlockProcessor.FileEntry-ChangedFile]: File has changed C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-11-24_103359

I’m not clear which messages you saw, but if it’s “Checking file for changes”, which changes is it saying? When that line happens, the file is likely to be read completely, looking to find what portions of it changed.

You can see the reference to filters, and these can slow a filesystem scan. Any filters on Source screen? Expand both Filters and Exclude. I haven’t heard of filters causing your high level of slowdown though.

It’s a one-way trip at the moment (can’t go back to 2.0.5.1 easily). I use it, but it has its risks (and its fixes). Probably it would be best to first look for the slowness. You’re only just starting on looking at the log output.

You’ve still got plenty to try from previous advice, e.g. in logs. If you are willing to get very deep into logging, Sysinternals Process Monitor can give detailed information (hard to read) on lots of things, such as details of file requests (whether or not they actually produce disk activity – file requests may just involve memory).

Hi @ts678

I have many filters. mostly are regex expressions. Here my command line
–encryption-module=
–compression-module=zip
–dblock-size=50mb
–retention-policy=1W:1D,4W:1W,12M:1M
–skip-files-larger-than=10MB
–no-encryption=true
–check-filetime-only=true
–usn-policy=Required
–verbose=true
–exclude-files-attributes=system,temporary
–disable-module=console-password-input
–exclude=[.(?i).(.bak|.sym|.opj|.vmdk|.map|.mk|.d|.o|.lst|.~|.vmem|.obj|.obk|.log|.elf|.iso|.mp4|.lck|.bin|.hex|.xz|.a|.gz|.bz2|.rar|.zip|.7z|.cache|.dll|.so|.so.)]
–exclude=[(?i).
\Tools\]
–exclude=[(?i).\arm-none-eabi-gcc\]
–exclude=[(?i).
\build\]
–exclude=[(?i).\.metadata\]
–exclude=[(?i).
\.svn\]
–exclude=[(?i).\.git\]
–exclude=[(?i).
\pycache\]

2020-11-30 14-13-00.zip (186.6 KB)

I upload a small video to expose you the slownes of the scan file.

Here a small portion of the log file

  • Nov 30, 2020 2:15 PM: Checking file for changes D:\Drivers\CMSIS\Documentation\General\html\printComponentTabs.js, new: True, timestamp changed: True, size changed: True, metadatachanged: True, 8/31/2020 8:14:35 PM vs 1/1/0001 12:00:00 AM

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\printComponentTabs.js

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\printComponentTabs.js

  • Nov 30, 2020 2:15 PM: New file D:\Drivers\CMSIS\Documentation\General\html\pages.html

  • Nov 30, 2020 2:15 PM: Checking file for changes D:\Drivers\CMSIS\Documentation\General\html\pages.html, new: True, timestamp changed: True, size changed: True, metadatachanged: True, 8/31/2020 8:14:35 PM vs 1/1/0001 12:00:00 AM

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\pages.html

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\pages.html

  • Nov 30, 2020 2:15 PM: New file D:\Drivers\CMSIS\Documentation\General\html\open.png

  • Nov 30, 2020 2:15 PM: Checking file for changes D:\Drivers\CMSIS\Documentation\General\html\open.png, new: True, timestamp changed: True, size changed: True, metadatachanged: True, 8/31/2020 8:14:35 PM vs 1/1/0001 12:00:00 AM

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\open.png

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\open.png

  • Nov 30, 2020 2:15 PM: New fileD:\Drivers\CMSIS\Documentation\General\html\nav_h.png

  • Nov 30, 2020 2:15 PM: Checking file for changes D:\Drivers\CMSIS\Documentation\General\html\nav_h.png, new: True, timestamp changed: True, size changed: True, metadatachanged: True, 8/31/2020 8:14:35 PM vs 1/1/0001 12:00:00 AM

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\nav_h.png

  • Nov 30, 2020 2:15 PM: Including path as no filters matched: D:\Drivers\CMSIS\Documentation\General\html\nav_h.png

I don’t know if this can help a bit

What I’d suggest is either a test backup of some area that can run without filters (to see if it scans fast) or avoid the backup part completely while (presumably) keeping the filters in The TEST-FILTERS command.

Exclude list for windows talked about the slowdown from that list (though your slowdown might be worse)
Filter groups code is extremely slow #3395 talked about issues. You’re not using OS-specific filters, but a large number of do-it-yourself filters might see the same issue. I am not a filter expert, and definitely not a performance expert regarding the Duplicati filters, so first thing to do is to check if your slowness is filters.

I’m not sure how much parallel work is done. If none, your I7 might max out one logical core, at 12% CPU. Adding in the overhead might raise it to 30%. CPU use running test-filters might be interesting to see.

I think this is the issue with your performance. One of the things I see is that it seems like you have a bit too many “.” classes. In this case it matches any character, so unless I’m misunderstanding what’s going on there, duplicati has to check every single letter of every single string to see if it matches the rest of your patters. And then combine it with the whole bunch of “OR” “|” pipes there and so it spends all that time matching.
Perhaps you wanted to prefix some of those . with \ to escape them?
Or maybe rewrite it as something like

-[(?i).*\.(bak|sym|opj|vmdk|map....)]

Hi @Mxx and @ts678

I will test your point today. I will tests without filter, and with the corrected exclude from @Mxx. I’m realy new to duplicati and filters.

I’ll keep you updated in the day.

Regards

Hi,

I got beter result now when fixing the reges mentionned by @Mxx. For my whole folder (~200K files) took 12 minutes. The problem now is that the filters not working well. I’ll have a guest that something is wrong on the UI that generate the correct command line. If I test my regex directly on the console with the test-filters command, all is working as expected. From the UI, is not exclude anything that have a dot on the folder name.

here the command line I use to test:

"C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe" test-filters d:\projects2 "--exclude=[(?i).*\.(bak|sym|opj|vmdk|map|mk|d|o|lst|~|vmem|obj|obk|log|elf|iso|mp4|lck|bin|hex|xz|a|gz|bz2|rar|zip|7z|cache|dll|so\.[0-9]|so)]" --exclude=[(?i).*\\.svn\\] --exclude=[(?i).*\\.git\\] --exclude=[(?i).*\\__pycache__\\] --exclude=[(?i).*\\Tools\\] --exclude=[(?i).*\\arm-none-eabi-gcc\\] --exclude=[(?i).*\\build\\] --exclude=[(?i).*\\.metadata\\]

Here the output on a sample folder

Including source path: d:\projects2\
Excluding path due to filter: d:\projects2\.git\ => ([((((((((?i).*\.(bak|sym|opj|vmdk|map|mk|d|o|lst|~|vmem|obj|obk|log|elf|iso|mp4|lck|bin|hex|xz|a|gz|bz2|rar|zip|7z|cache|dll|so\.[0-9]|so))|((?i).*\\.svn\\))|((?i).*\\.git\\))|((?i).*\\__pycache__\\))|((?i).*\\Tools\\))|((?i).*\\arm-none-eabi-gcc\\))|((?i).*\\build\\))|((?i).*\\.metadata\\)])
Excluding path due to filter: d:\projects2\__pycache__\ => ([((((((((?i).*\.(bak|sym|opj|vmdk|map|mk|d|o|lst|~|vmem|obj|obk|log|elf|iso|mp4|lck|bin|hex|xz|a|gz|bz2|rar|zip|7z|cache|dll|so\.[0-9]|so))|((?i).*\\.svn\\))|((?i).*\\.git\\))|((?i).*\\__pycache__\\))|((?i).*\\Tools\\))|((?i).*\\arm-none-eabi-gcc\\))|((?i).*\\build\\))|((?i).*\\.metadata\\)])
Including path as no filters matched: d:\projects2\test.c
Excluding path due to filter: d:\projects2\test.o => ([((((((((?i).*\.(bak|sym|opj|vmdk|map|mk|d|o|lst|~|vmem|obj|obk|log|elf|iso|mp4|lck|bin|hex|xz|a|gz|bz2|rar|zip|7z|cache|dll|so\.[0-9]|so))|((?i).*\\.svn\\))|((?i).*\\.git\\))|((?i).*\\__pycache__\\))|((?i).*\\Tools\\))|((?i).*\\arm-none-eabi-gcc\\))|((?i).*\\build\\))|((?i).*\\.metadata\\)])
Excluding path due to filter: d:\projects2\test.so.6 => ([((((((((?i).*\.(bak|sym|opj|vmdk|map|mk|d|o|lst|~|vmem|obj|obk|log|elf|iso|mp4|lck|bin|hex|xz|a|gz|bz2|rar|zip|7z|cache|dll|so\.[0-9]|so))|((?i).*\\.svn\\))|((?i).*\\.git\\))|((?i).*\\__pycache__\\))|((?i).*\\Tools\\))|((?i).*\\arm-none-eabi-gcc\\))|((?i).*\\build\\))|((?i).*\\.metadata\\)])
Matched 1 files (0 bytes)

But When used with the UI, noting is filtered out.

How should I write the filters on the UI ?

Regards

This is about the same speed as my desktop backup sets run. ~80k files, <10GB in size takes about 3-15min depending on how much changes and if any maintenance needs to be done.

Maybe check the backup config file to make sure that it looks correct and doesn’t have any unnecessary or erroneous filters?