How to find the reason for slow backup according to "check-filetime-only"

Hello,

I have a backup with about 1 TB. Source 157k files, 9000 folders.

I have set the --check-filetime-only=true but the backup is very, very slow WITHOUT ANY FILES CHANGED. The “… files need to be examined” is visible for HOURS without any file has changed. I started procmon (sysinternals suite) and looked what Duplicati is doing and I see this (filtered the query aso, just to see the ReadFile):

So Duplicati is reading this file in 100k blocks.
image

As I see in the procmon output EVERY SINGLE FILE is read completly! At the moment this backup is running since 35 minutes and has read from the source file system (a different local Windows machine with a share) throu port 445 about 120 GB (seen in Tool Net Limiter) . This is a reallistic speed with about 50-60MB/sec.

The File Timestamp for the example file is in CMD:

in Windows

image

This file was already included in the first version of the backup and has not changed size or date since the first backup some days ago.

My Question: How can I find out why Duplicati is reading all the files and not just checking the timestamp?

This would be easier in GUI (About → Show log → Live → Verbose), but this is CLI, correct?

Here’s an example of the kind of line that explains the reason why Duplicati is processing a file:

Jan 10, 2021 5:33 PM: Checking file for changes C:\ProgramData\Duplicati\duplicati-2.0.5.111_canary_2020-09-26\RUN\server.2.0.5.111_canary_2020-09-26.log, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 1/10/2021 10:33:57 PM vs 1/10/2021 8:32:57 PM

One unfortunate omission is that times are shown to the second, but might differ sub-second.

Was backup previously run over SMB, but with changed files, and not being slow until change:

I’m trying to understand the history of the backup, to learn when things worked and then did not.
Clarification of prior configuration, practice, problems, and repairs might give clues on situation.

There’s actually a known hard-to-fix gap where this causes timestamp updates to not be saved.
Situation would start with database Recreate or maybe an environmental change, so I asked…
That problem would self-correct when a change was done, which typically happens quite soon.
You’ve got at least one file that didn’t change, but did anything change? If not, why do a backup?

Is this backup also defined in the GUI? That would be a way to look at its job log, after it finishes.
Alternatively, maybe you have this job set to report summary results by email or http somehow?
With no change, you might not even get a backup version without –upload-unchanged-backups.

Looking at it closely while it’s still running would probably take manual work, browsing database.
Fortunately it’s reading pretty fast. Based on your speed statement, maybe it’d read for 5 hours.

Is the Duplicati.CommandLine.exe output going to a file? I’m not sure how fast the console runs.
You can specify –console-log-level=verbose, or –log-file=<path> and –log-file-log-level=verbose.

1 Like

Thank you. I added a logfile and “Verbose” level for that logfile and now I see the reason:

True, timestamp changed: True, size changed: True, metadatachanged: True, 16.09.2019 12:41:12 vs 01.01.0001 00:00:00

and this for nearly all files but not for all?!?

Don’t know why? This is a relative new backup. Maybe there was a problem on the first run? I have deleted the backup and the job and will start again with a new job in a new and empty destination. Hopefull this time this problem will not occure again.

Thank you @ts687

Follow up: Backed up again (to test only 1/3 of the amount) and rerun it over night again and again and again and the was no “re-read” of already read data!

So don’t know what the problem was with the first backup but I will add now the rest 2/3 of the data and I think then the “backup without new data” will in about 15-20 minutes like I would expect it and not half a day.

Maybe the first run was not sucessful? I read in some threads here that this can cause following problems.

Too late to look, after deleting the backup and the job (unless you saved the database from the original).

You would also see exactly the kind of messages you got if you had a new path, e.g. if mount changed.
Never-before-seen path is considered a new file, although content would be deduplicated as a repeat…

Did you get a chance to look at destination files before deletion? Was it as big as first time, or far less?