Time to complete backup job has trippled in the last week

Ok, so the compact evaluation isn’t the culprit. Have you ever vacuumed your database? Maybe you can try a vacuum operation to see if it helps.

Excuse the ignorance - but how do I do this?

In the main web UI, click the backup job, and click the blue “Commandline …” link. Pick “vacuum” in the Command field and clear the “commandline arguments” field. Scroll to the bottom and click the “run vacuum command” button.

Thanks
I have now run that. The next backup is due at 10pm my time
I will update here tomorrow with if it worked or not

image

Hi drwtsn32,
It looks like the vacuum helped (bringing it down to 1 hour). Its a significant improvement, however still not back to the 40 minutes it was previously

You have almost 800,000 files included in your backup source. I bet you could improve performance even more by using the USN option. This lets Duplicati use the NTFS change journal to find new/changed files instead of enumerating the filesystem.

Try setting:

--usn-policy=auto

I’m not certain, but it may not work the first time you run a backup after setting the option. You may have to wait for the second run to see if it helped.

Also, this option requires that Duplicati be running elevated. Since you’re using Windows Server, I am going to guess you’re running Duplicati as a service so it probably has admin rights.

Thanks, I’ll give that a go and report back
Will test on job 1 and if it doesn’t improve I’ll let it run a second job the next day and then come back

You might want to set it to ‘on’ instead of ‘auto’ so it’s more obvious if Duplicati wasn’t able to use the USN.

I decided to run an out of schedule backup to test this - but it failed

Sat on “Starting Backup” for about 45-60 mins
After many refreshes i decided to close and re-open the browser and then it changed to backing up… after a few mins it then failed where i got the following error:

image

So I went into About > Log > Live Log
Where the following was listed

Was the bottom of that the end of the whole thing? The channel is retired means an earlier error happened.
Sometimes the stack shows it further down. An example of a longer one that permitted an analysis is here.

That was it - nothing more before or after it

However i have just restarted it now and am watching the log and it appears to be doing a lot more this time so will see how i go

Okay so the use of --usn-policy=auto has not resulted in any performance gains, and seems to have reverted back to the 2 hours it was taking before doing the database vacuum

I am about to change the config to --usn-policy=on as you mentioned - just wanted to give auto enough of a chance to run properly to be sure it wasn’t working

What’s a typical amount of sitting time? I think the next status bar update is Verifying backend data.
For any spots that appear slow, you can also use About → Show log → Live → Information (or greater).
A similar on-disk view can be had with log-file=<path> plus log-file-log-level=<something>, for example:

2021-08-25 09:18:28 -04 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2021-08-25 09:19:07 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
2021-08-25 09:19:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (854 bytes)
2021-08-25 09:19:18 -04 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingSourcePath]: Including source path: C:\PortableApps\Notepad++Portable\

The gap between first list can be full of SQL, but only profiling level shows that (you can try it in live log).

As an additional remark whenever there’s a performance issue, have you run Task Manager or Resource Monitor (or fancier things like Sysinternals Process Explorer) to see how system and process are doing?

It appears that this is the standard time - or at least the standard time now. I am unsure how long it would take before the speed reduction. However intuitively it feels as though starting backup should be a relatively quick process

I have looked at this while it is running the backup and it does appear to freeze/hang on some files. Interestingly there are 31 files which error on every backup (this goes back even before the performance issues began). These files correlate to a hang/pause in the Live log

Yes, the server is showing minimal signs of use while performing a backup, even when in a hanging/paused state. Nothing of concern pops out here.
Worth noting: The server is well and truley over resourced. CPU does not hit more than 10%, RAM sits at 12%, and Disk (running 8 SSD’s in RAID) barely notices the utilisation

The amount of data and number of new files appears to have no impact to the duration of the backup - see below screenshot.
In the two outlined examples we can see that 4 days ago we processed 4GB of new data and 145mb of modified data - taking 2 hours and 2 minutes
Where as 4 hours ago we processed only 704mb of new data and 101mb of modified data - and this backup took 2 hours and 3 minutes. 1 minute longer than the much larger backup

As it turns out - I am not
I had every intention of changing the install from a user process to a service after initial testing was complete, however it appears that I did not get back to this.
I will make the change and see if it helps

This might not be the case. There can be a large amount of self-checking, mostly using SQL.
You can get a peek at what’s going on at startup using About → Show log → Live → Profiling.

What does live log say is the issue with the files (might vary)? You might need to click the line.
You could also turn down the amount of output by logging at Warning level (also shows Error).

My guess is that it’s the sanity checking done on the database at the start of the backup job. A 1.64TB backup with default dedupe factor of 100KB is going to result in a LOT of blocks being tracked by the database.

If that’s near a Task Manager CPU graph, please right-click it to check a Logical processors view.
One processor completely utilized by a single thread could only push Overall utilization to 6%.

If it is truly busy in SQLite, that’s single-threaded. Please look at profiling logs anyway for activity, e.g.

2021-03-10 09:40:52 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT COUNT(*) FROM (SELECT DISTINCT "Path" FROM (
SELECT
    "L"."Path", 
    "L"."Lastmodified", 
    "L"."Filelength", 
    "L"."Filehash", 
    "L"."Metahash", 
    "L"."Metalength",
    "L"."BlocklistHash", 
    "L"."FirstBlockHash",
    "L"."FirstBlockSize",
    "L"."FirstMetaBlockHash",
    "L"."FirstMetaBlockSize",
    "M"."Hash" AS "MetaBlocklistHash"
FROM
    (
    SELECT 
        "J"."Path", 
        "J"."Lastmodified", 
        "J"."Filelength", 
        "J"."Filehash", 
        "J"."Metahash", 
        "J"."Metalength",
        "K"."Hash" AS "BlocklistHash", 
        "J"."FirstBlockHash",
        "J"."FirstBlockSize",
        "J"."FirstMetaBlockHash",
        "J"."FirstMetaBlockSize",
        "J"."MetablocksetID"
    FROM 
        (
        SELECT 
	        "A"."Path" AS "Path", 
	        "D"."Lastmodified" AS "Lastmodified", 
	        "B"."Length" AS "Filelength", 
	        "B"."FullHash" AS "Filehash", 
	        "E"."FullHash" AS "Metahash", 
	        "E"."Length" AS "Metalength",
	        "A"."BlocksetID" AS "BlocksetID",
	        "F"."Hash" AS "FirstBlockHash",
	        "F"."Size" AS "FirstBlockSize",
	        "H"."Hash" AS "FirstMetaBlockHash",
	        "H"."Size" AS "FirstMetaBlockSize",
	        "C"."BlocksetID" AS "MetablocksetID"
        FROM 
	        "File" A	
        LEFT JOIN "Blockset" B
          ON "A"."BlocksetID" = "B"."ID" 
        LEFT JOIN "Metadataset" C  
          ON "A"."MetadataID" = "C"."ID"
        LEFT JOIN "FilesetEntry" D
          ON "A"."ID" = "D"."FileID"
        LEFT JOIN "Blockset" E
          ON "E"."ID" = "C"."BlocksetID"
        LEFT JOIN "BlocksetEntry" G
          ON "B"."ID" = "G"."BlocksetID"
        LEFT JOIN "Block" F 
          ON "G"."BlockID" = "F"."ID"  
        LEFT JOIN "BlocksetEntry" I
          ON "E"."ID" = "I"."BlocksetID"
        LEFT JOIN "Block" H 
          ON "I"."BlockID" = "H"."ID"
        WHERE 
          "A"."BlocksetId" >= 0 AND
          "D"."FilesetID" = 1 AND
          ("I"."Index" = 0 OR "I"."Index" IS NULL) AND  
          ("G"."Index" = 0 OR "G"."Index" IS NULL)
        ) J
    LEFT OUTER JOIN 
        "BlocklistHash" K 
    ON 
        "K"."BlocksetID" = "J"."BlocksetID" 
    ORDER BY "J"."Path", "K"."Index"
    ) L

LEFT OUTER JOIN
    "BlocklistHash" M
ON
    "M"."BlocksetID" = "L"."MetablocksetID"
) UNION SELECT DISTINCT "Path" FROM (
SELECT
    "G"."BlocksetID",
    "G"."ID",
    "G"."Path",
    "G"."Length",
    "G"."FullHash",
    "G"."Lastmodified",
    "G"."FirstMetaBlockHash",
    "H"."Hash" AS "MetablocklistHash"
FROM
    (
    SELECT
        "B"."BlocksetID",
        "B"."ID",
        "B"."Path",
        "D"."Length",
        "D"."FullHash",
        "A"."Lastmodified",
        "F"."Hash" AS "FirstMetaBlockHash",
        "C"."BlocksetID" AS "MetaBlocksetID"
    FROM
        "FilesetEntry" A, 
        "File" B, 
        "Metadataset" C, 
        "Blockset" D,
        "BlocksetEntry" E,
        "Block" F
    WHERE 
        "A"."FileID" = "B"."ID" 
        AND "B"."MetadataID" = "C"."ID" 
        AND "C"."BlocksetID" = "D"."ID" 
        AND "E"."BlocksetID" = "C"."BlocksetID"
        AND "E"."BlockID" = "F"."ID"
        AND "E"."Index" = 0
        AND ("B"."BlocksetID" = -100 OR "B"."BlocksetID" = -200) 
        AND "A"."FilesetID" = 1
    ) G
LEFT OUTER JOIN
   "BlocklistHash" H
ON
   "H"."BlocksetID" = "G"."MetaBlocksetID"
ORDER BY
   "G"."Path", "H"."Index"

)) took 0:00:00:00.165

where your time (last line) is probably longer, and your count runs from "FilesetID" = 1 to last version. How many versions are there? @drwtsn32 commented on size. Maybe versions need some trimming?

There are also the 31 complaining files to look at, but those are probably not during the initial startup time.

Hi all,
The ssue is now resolved
In my testing @drwtsn32 mentioned an assumption that Duplicati was running as a Windows service - however this was not the case. I had set it up in user mode for testing and had not yet converted it to a service.
Upon converting from a user process to a service (and waiting the 6 hours for it to rebuild the database) it appears to have resolved the issue.
I now get more warnings (from 34 to 141) of the below nature, however for the last 6 days the backups have run taking only 38 minutes

[Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file:

I am unclear if it was the change from user process to system service that fixed it, or if it was the rebuilding of the database. But I suggest time will tell. If it was indeed the change to process (assumably from permissions issues) then this issue should be permanently resolved, however if it was he database it will very likely return again after some time

Thanks to @drwtsn32 and @ts678 for your assistance in troubleshooting and should the issue come back I will update this thread further

1 Like

I hope I’m wrong, but I don’t think this will permanently resolve your issue with runtime.

Converting from user mode to a service isn’t actually related directly to being able to use snapshots or USN. I run Duplicati in user mode (non-service) on most of my machines but make sure they start elevated. As such I can utilize snapshots and USN. Also when you make this transition you can actually move/copy your database in order to save time (you don’t have to do the recreate procedure, although doing so is kind of a good test!)

I suspect you are right - time will tell!

And yeah, I found out about the ability to move the database while I was researching why it was taking so long start for the first time LOL