Backups suddenly taking much longer to complete

amloessb · September 4, 2021, 10:54am

I have been experiencing a problem over the past few days eerily similar to the one m.arthur reported recently. Out of nowhere, my backups seem to be taking a much longer time to complete than they normally do–going from a typical 18-23 minute runtime to taking anywhere from 1-2 hours to complete. Interestingly, this is only happening (to this degree) to one of my backup sets. The other is (relatively) unaffected.

Vitals

To start with, some basic information on my setup: I’m running Windows 8.1 Pro (64-bit) and Duplicati - 2.0.6.3_beta_2021-06-17.

Backup sets

	Anomalous set	Well-behaved set
Name	Backup to FTP	Backup to Azure
Source size	644.20 GB	523.61 GB
Backup size	710.72 GB	561.81 GB
Versions	221	96
Oldest version	October, 2018	October, 2018
Frequency	hourly	daily
Normal duration	18-23 minutes	12-20 minutes
Duration since 9/1	19-120 minutes	21-36 minutes
Auto-compact interval	weekly	weekly
Rate of growth (backup)	530 MiB/week	220 MiB/week
Chunk (block) size	100 kB (default)	100 kB (default)
Volume (dblock) size	100 MB	50 MB
Database size	7.40 GiB	3.36 GiB

Timeline

Looking at my duplicati-monitoring logs, this appears to have started with the 11:00 run of my “Backup to FTP” set on 9/1, which took 94 minutes to complete. Since then, the backups of this set have regularly been taking more than an hour to complete, and up to 2 hours in the worst case.

A full listing of all backups for this set, along with their durations, can be viewed below.

backup list (screenshots)

Observations

Time spent in different backup phases

I keep a separate, local log file for each backup set (logging at the Retry level). For the purposes of this issue, there are two distinct periods of time for each backup run:

Time-to-list: the elapsed time from when the backup operation starts to the first logged backend List event
List-to-end: the elapsed time from the first logged backend List event to the last completed backend Get event (as part of the post-backup verification)

So, for example, in the following backup run, the time-to-list would be 11 minutes, and the list-to-end time would be 9 minutes, for a total backup duration of 20 minutes. (This is typical of backups for the “Backup to FTP” set, prior to the manifestation of the problem.)

sample nominal backup log

2021-08-30 00:00:02 -07 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2021-08-30 00:11:15 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2021-08-30 00:11:20 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (15.03 KB)
2021-08-30 00:16:48 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b719b8c04fee145b494c501e06126eecd.dblock.zip.aes (14.48 MB)
2021-08-30 00:16:53 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-b719b8c04fee145b494c501e06126eecd.dblock.zip.aes (14.48 MB)
2021-08-30 00:17:08 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i16728579942248dfb2e0faf42380e997.dindex.zip.aes (100.58 KB)
2021-08-30 00:17:08 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i16728579942248dfb2e0faf42380e997.dindex.zip.aes (100.58 KB)
2021-08-30 00:17:13 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-20210830T070002Z.dlist.zip.aes (29.74 MB)
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-20210830T070002Z.dlist.zip.aes (29.74 MB)
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-FramesAndIntervals]: Time frames and intervals pairs: 1.00:00:00 / Keep all, 7.00:00:00 / 01:00:00, 90.00:00:00 / 1.00:00:00, 365.00:00:00 / 7.00:00:00, Unlimited / 31.00:00:00
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupList]: Backups to consider: 2021-08-29 23:00:01, 2021-08-29 22:00:01, 2021-08-29 21:00:01, <snip>
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsToDelete]: Backups outside of all time frames and thus getting deleted: 
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-AllBackupsToDelete]: All backups to delete: 2021-08-23 00:00:02
2021-08-30 00:17:16 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteRemoteFileset]: Deleting 1 remote fileset(s) ...
2021-08-30 00:19:49 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-20210823T070002Z.dlist.zip.aes (29.71 MB)
2021-08-30 00:19:49 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-20210823T070002Z.dlist.zip.aes (29.71 MB)
2021-08-30 00:19:50 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteResults]: Deleted 1 remote fileset(s)
2021-08-30 00:19:50 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2021-08-30 00:19:55 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (15.03 KB)
2021-08-30 00:19:55 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-20210830T040001Z.dlist.zip.aes (29.74 MB)
2021-08-30 00:19:59 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-20210830T040001Z.dlist.zip.aes (29.74 MB)
2021-08-30 00:19:59 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-ib2129f08e61c483ca60605041c07c229.dindex.zip.aes (108.37 KB)
2021-08-30 00:19:59 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-ib2129f08e61c483ca60605041c07c229.dindex.zip.aes (108.37 KB)
2021-08-30 00:20:00 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-bdbb197ed037e469ca6caf3cbe56b5d38.dblock.zip.aes (67.08 MB)
2021-08-30 00:20:08 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-bdbb197ed037e469ca6caf3cbe56b5d38.dblock.zip.aes (67.08 MB)

Between the start of the backup operation and the first List command being issued against the backend (that is, the time-to-list period), Duplicati appears to be warming up and twiddling the database, as can be observed by viewing the live Profiling log:

live log screenshot

The interesting thing about the backups which are taking much longer is that, while there is only a modest increase in list-to-end time (e.g., to 16 minutes), the time-to-list has greatly increased (e.g., to 50+ minutes), and seems to be responsible for the lion’s share of the backup duration.

sample long-running backup log

2021-09-02 15:42:22 -07 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2021-09-02 16:35:55 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2021-09-02 16:35:59 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (15.15 KB)
2021-09-02 16:46:18 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-bed9c3b61772442ddab4a44cc757d53c3.dblock.zip.aes (15.31 MB)
2021-09-02 16:46:46 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-bed9c3b61772442ddab4a44cc757d53c3.dblock.zip.aes (15.31 MB)
2021-09-02 16:47:09 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-ib0ac698cc45c4fd08a512c995f540072.dindex.zip.aes (117.61 KB)
2021-09-02 16:47:10 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-ib0ac698cc45c4fd08a512c995f540072.dindex.zip.aes (117.61 KB)
2021-09-02 16:47:17 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-20210902T224222Z.dlist.zip.aes (29.75 MB)
2021-09-02 16:47:20 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-20210902T224222Z.dlist.zip.aes (29.75 MB)
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-FramesAndIntervals]: Time frames and intervals pairs: 1.00:00:00 / Keep all, 7.00:00:00 / 01:00:00, 90.00:00:00 / 1.00:00:00, 365.00:00:00 / 7.00:00:00, Unlimited / 31.00:00:00
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupList]: Backups to consider: 2021-09-02 14:34:17, 2021-09-02 13:00:03, 2021-09-02 12:00:03, <snip>
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsToDelete]: Backups outside of all time frames and thus getting deleted: 
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-AllBackupsToDelete]: All backups to delete: 2021-08-26 16:12:49
2021-09-02 16:47:21 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteRemoteFileset]: Deleting 1 remote fileset(s) ...
2021-09-02 16:52:27 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-20210826T231249Z.dlist.zip.aes (29.80 MB)
2021-09-02 16:52:27 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-20210826T231249Z.dlist.zip.aes (29.80 MB)
2021-09-02 16:52:33 -07 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteResults]: Deleted 1 remote fileset(s)
2021-09-02 16:52:33 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2021-09-02 16:52:37 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (15.15 KB)
2021-09-02 16:52:38 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-20210902T224222Z.dlist.zip.aes (29.75 MB)
2021-09-02 16:52:42 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-20210902T224222Z.dlist.zip.aes (29.75 MB)
2021-09-02 16:52:42 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-i5c6cf9206d664b6494dab38f3e1aaeee.dindex.zip.aes (125.58 KB)
2021-09-02 16:52:42 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-i5c6cf9206d664b6494dab38f3e1aaeee.dindex.zip.aes (125.58 KB)
2021-09-02 16:52:42 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b749c6c78581e405685449c068e5237ad.dblock.zip.aes (14.83 MB)
2021-09-02 16:52:44 -07 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b749c6c78581e405685449c068e5237ad.dblock.zip.aes (14.83 MB)

I don’t have time to do so right now, but I plan to whip up a script tomorrow to analyze the backup logs and list the time-to-list, list-to-end, and total duration for every backup, so we can get a better idea of the actual breakdown across backups.

Time-of-day variation

Looking back at the duplicati-monitoring screenshots, it’s pretty evident that backup durations are better in the mornings, before deteriorating in the afternoon. Since this problem has only manifested since Wednesday, I still don’t know whether “time of day” or “time since boot” is the true driver here. I’ll report back once I know more.

Attempts to remediate

`vaccum`-ing the database

I remembered reading something on the forums a while ago talking about performance gains from vaccum-ing a backup set’s database, so I gave it a shot. While it decreased the size of the DB by 500 MB or so, there was no observable increase in performance as a result.

Defragmenting the database drive

The Duplicati databases for both of my backup sets live on the same SSD (with nothing else except some Icaros icon DBs), so I wondered if the constant writing to the sqlite database files might have ended up severely fragmenting them (before you say anything, yes, your SSD does sometimes need to be defragmented). Windows’ defrag.exe showed 4% fragmentation on the drive, so I had it defragment just in case. Sadly, it did not help.

Next steps

There’s still a lot of poking and prodding I want to do to see if I can’t tease out what’s causing this, but I wanted to get this post written up first, just in case anyone has seen an issue like this before, or has any specific details they want to request from me.

Mostly to remind myself, I plan to work on the following and report back my results:

Analyze the logs to get the time-to-list, list-to-end, and total duration for every backup for the past week
Figure out if time of day really affects backup duration
Manually retrim the SSD the databases live on (?)
Check for new programs installed around when the problem started

If you have any ideas or would like me to report back any other specific information, please feel free to chime in!

drwtsn32 · September 4, 2021, 1:49pm

First of all let me thank you for providing such a great level of detail!

It looks like it’s spending almost an hour at the start of the job before it even checks the back end. In this initial phase one of the things being done is a sanity check on the database. My GUESS is you are having the same issue as the other post you linked to in your first sentence. The size of your database, size of the backup data, number of versions, and 100KB block size is resulting in a lot of blocks to track. This results in the sanity check taking more and more time.

One thing I don’t see mentioned is how many files you are backing up in the source. If it’s a huge number, then enabling USN (assuming you are on Windows) should speed up the file scan phase. We had the poster in that other thread do this but it didn’t help in his case.

If you are backing up tens or hundreds of thousands of files, try the USN option. May take a couple backups before you see the difference. If that doesn’t work, consider reducing your retention to reduce versions if possible.

For your size backup I would have recommend a 1MB dedupe block size (chunk size) but unfortunately you cannot change that setting without starting over. A larger chunk size helps keep the database efficient as there are less blocks to track, at the expense of reduced deduplication efficiency.

ts678 · September 4, 2021, 3:11pm

Very nice writeup, and I agree with the @drwtsn32 requests and comments. You could also temporarily suspend your hourly backups (sometimes I just uncheck the current day, but remember to set it up later).

Using the backup’s Verify files button will (I think) let you time the default pre-backup verification alone.

There is a somewhat risky way of turning off automatic verification with an option, but please see warning:

  --disable-filelist-consistency-checks (Boolean): Disable filelist
    consistency checks
    In backups with a large number of filesets, the verification can take up a
    large part of the backup time. If you disable the checks, make sure you
    run regular check commands to ensure that everything is working as
    expected.
    * default value: false

github.com

duplicati/duplicati/blob/b9aad24deb4a4c88bb284bce3a6ee64e1939ce9e/Duplicati/Library/Main/Database/LocalDatabase.cs#L797-L816


      
          if (verifyfilelists)

          {

              using (var cmd2 = m_connection.CreateCommand(transaction))

                  foreach (var filesetid in cmd.ExecuteReaderEnumerable(@"SELECT ""ID"" FROM ""Fileset"" ").Select(x => x.ConvertValueToInt64(0, -1)))

                  {

                      var expandedCmd = string.Format(@"SELECT COUNT(*) FROM (SELECT DISTINCT ""Path"" FROM ({0}) UNION SELECT DISTINCT ""Path"" FROM ({1}))", LocalDatabase.LIST_FILESETS, LocalDatabase.LIST_FOLDERS_AND_SYMLINKS);

                      var expandedlist = cmd2.ExecuteScalarInt64(expandedCmd, 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID, filesetid);

                      //var storedfilelist = cmd2.ExecuteScalarInt64(string.Format(@"SELECT COUNT(*) FROM ""FilesetEntry"", ""FileLookup"" WHERE ""FilesetEntry"".""FilesetID"" = ? AND ""FileLookup"".""ID"" = ""FilesetEntry"".""FileID"" AND ""FileLookup"".""BlocksetID"" != ? AND ""FileLookup"".""BlocksetID"" != ?"), 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID);

                      var storedlist = cmd2.ExecuteScalarInt64(@"SELECT COUNT(*) FROM ""FilesetEntry"" WHERE ""FilesetEntry"".""FilesetID"" = ?", 0, filesetid);

          

                      if (expandedlist != storedlist)

                      {

                          var filesetname = filesetid.ToString();

                          var fileset = FilesetTimes.Zip(Enumerable.Range(0, FilesetTimes.Count()), (a, b) => new Tuple<long, long, DateTime>(b, a.Key, a.Value)).FirstOrDefault(x => x.Item2 == filesetid);

                          if (fileset != null)

                              filesetname = string.Format("version {0}: {1} (database id: {2})", fileset.Item1, fileset.Item3, fileset.Item2);

                          throw new Interface.UserInformationException(string.Format("Unexpected difference in fileset {0}, found {1} entries, but expected {2}", filesetname, expandedlist, storedlist), "FilesetDifferences");

                      }

                  }

          }

If this is truly the slow place, I don’t know what would cause SQLite to suddenly get slower, however the query language is sort of an abstract way of saying what’s desired, and the database picks how to do it.

The SQLite Query Optimizer Overview suggests to me (any DB expert out there?) that maybe changed plan sometimes changes the run times. Your number of backup versions isn’t likely moving swiftly, but a test of a single version may see different times . Your sample profiling log looks like it doesn’t show time, because the operation is currently running. You can scroll down for earlier times, or set up a log-file with log-file-log-level=profiling to ease compares (might need something to handle big log files – I use glogg).

EDIT:

You could also install Sysinternals Process Monitor to either watch what Duplicati is doing during query. Activity on the main Database is quite likely, but also see if any files are accessed with etilqs in name.
These are SQLite temporary files, and I’m not sure when it decides to use them, but it may slow things.

m.arthur · September 5, 2021, 7:27pm

@amloessb Out of curiousity, is your Duplicati instance running as a Windows process or service?
If process, is it running with elevated/administrator permissions?

My issues resolved themselves upon converting from Windows process to Windows Service, but i was unable to identify if the solution was permissions related or if it was indeed Database related

If your’s runs as user process with non-elevated permissions, perhaps you could run as Administrator and see if the issue is now resolved. This should identify which of the two theories is causing the issues without having to rebuild the database.
Then if you later change from a dedupe block size of 100kb to 1MB as recommended by @drwtsn32 at which point it will recreate the database

drwtsn32 · September 5, 2021, 8:06pm

Unfortunately more is needed to change the dedupe block size. You have to start over by clearing the back end data and the local database. The existing back end data cannot be used when changing the dedupe block size.

If you want to keep the backup history, you could just leave the old job as-is and turn off the scheduler. That way it’s available to do restores from. Then set up a new scheduled job with the new dedupe block size, target a new folder on the back end, and start your backups from scratch.

By the way: my rule of thumb is to have the dedupe block size to be about 1/1,000,000th of the expected source data size, perhaps rounding up for growth. So if you have around 1TB of source data, 1MB block size should work pretty well. People backing up 5TB might want to use 5MB, and so on…

ts678 · September 5, 2021, 9:39pm

Thanks for sharing what resolved the issue. In case that’s unclear to anyone, see the resolution:

One doesn’t have to recreate the database when migrating to a service. One can move it instead.
One can also recreate the database without migrating, and it might be a simpler experiment to try.

But there are a number of experiments suggested, and it would be good to gain an understanding.
One possible way to get both might be to time the Verify files, move database aside, Repair,
then time the Verify files again. Checking Profiling log for SQL execution times may also help.

A recreated database will likely be smaller than even the original after vacuum, because it will lose
some history such as the job logs and remote operations done. log-retention keeps them trimmed,
and I can’t think of any data-oriented tables that grow to the sky. Still, it might be fast after recreate.

amloessb · September 6, 2021, 11:58am

I’m currently running Duplicati as a regular process, and un-elevated. However, as was mentioned in the other thread, I’m also skeptical that this was the resolution of your issue, and suspect that the re-creation of the database was the true catalyst.

Since my backups are sometimes completing in a normal amount of time and sometimes taking way longer, I want to see if I can figure out if there’s a difference in how Duplicati is accessing the database between these two scenarios. @ts678’s suggestion to use procmon to monitor resource access got me thinking, so I’m going to try using ProcessActivityView instead (since it provides an easier-to-read summary of activity), along with Profiling-level logging, to see if I can’t suss out what the difference is.

I do, however, agree with @drwtsn32 that one of the roots of this issue is the too-small block size, which is causing the database to have to keep track of too many individual blocks to be performant. I’ve been putting off doing something about it, since, as they mentioned, it would involve creating an entirely new backup from scratch, but if I can’t nail down what’s happening here, I may have to do just that.

m.arthur · September 6, 2021, 8:17pm

I’m also skeptical that this was the resolution of your issue, and suspect that the re-creation of the database was the true catalyst.

I appreciate (and agree you are probably right) that you don’t think that the premissions are the cause - however the best scientific approach (same as troubleshooting approach) to solving problems is not to think - it is to form some hypotheses, order them in quickest and easiest to test at the top then systematically go down the list and test them one at a time
My point is: To trial run the backup in elevated permissions is increadibly quick and easy - plus is a change that can be isolated from all other changes.
Meaning it can be proven or disproven very quickly and easily. Additionally it can be trialled over and over allowing you to reproduce the issue by running it without elevated permissions again - confirming this as the solution, or not

I originally suspected permissions as when I was watching the live log, I witnessed Duplicati hanging on files which (after it had finished hanging) showed up as “Access denied” in the log.
Also, the fact that the only 2 instances of this issue having been reported recently were both running as user processes, as opposed to elevated permissions or service. To me this is a little suspect

amloessb · September 11, 2021, 8:36am

Well, I’ve been fiddling with my backups and gathering data for the past week, and the conclusion that I’ve reached is that, most likely, the only solution to this issue is to do a full database rebuild as a temporary measure, and to re-create the backup completely (with a larger block size) as a more permanent solution.

I still don’t have a good answer as to why my backups sometimes complete in a reasonable amount of time and sometimes go on for over an hour, but my best guess is that some combination of properties between my database and my system is causing SQLite’s query planner to flip between more and less time-efficient plans, thus causing the varied durations.

Detailed observations follow.

Database query profiling

I logged at the Profiling level for a little over 24 hours, mostly to see if I could catch some variability in the time taken to complete some sample query across multiple backups. I randomly selected the query below to check.

The "FilesetID 16677" query

SELECT COUNT(*) FROM (SELECT DISTINCT "Path" FROM (
SELECT
    "L"."Path", 
    "L"."Lastmodified", 
    "L"."Filelength", 
    "L"."Filehash", 
    "L"."Metahash", 
    "L"."Metalength",
    "L"."BlocklistHash", 
    "L"."FirstBlockHash",
    "L"."FirstBlockSize",
    "L"."FirstMetaBlockHash",
    "L"."FirstMetaBlockSize",
    "M"."Hash" AS "MetaBlocklistHash"
FROM
    (
    SELECT 
        "J"."Path", 
        "J"."Lastmodified", 
        "J"."Filelength", 
        "J"."Filehash", 
        "J"."Metahash", 
        "J"."Metalength",
        "K"."Hash" AS "BlocklistHash", 
        "J"."FirstBlockHash",
        "J"."FirstBlockSize",
        "J"."FirstMetaBlockHash",
        "J"."FirstMetaBlockSize",
        "J"."MetablocksetID"
    FROM 
        (
        SELECT 
	        "A"."Path" AS "Path", 
	        "D"."Lastmodified" AS "Lastmodified", 
	        "B"."Length" AS "Filelength", 
	        "B"."FullHash" AS "Filehash", 
	        "E"."FullHash" AS "Metahash", 
	        "E"."Length" AS "Metalength",
	        "A"."BlocksetID" AS "BlocksetID",
	        "F"."Hash" AS "FirstBlockHash",
	        "F"."Size" AS "FirstBlockSize",
	        "H"."Hash" AS "FirstMetaBlockHash",
	        "H"."Size" AS "FirstMetaBlockSize",
	        "C"."BlocksetID" AS "MetablocksetID"
        FROM 
	        "File" A	
        LEFT JOIN "Blockset" B
          ON "A"."BlocksetID" = "B"."ID" 
        LEFT JOIN "Metadataset" C  
          ON "A"."MetadataID" = "C"."ID"
        LEFT JOIN "FilesetEntry" D
          ON "A"."ID" = "D"."FileID"
        LEFT JOIN "Blockset" E
          ON "E"."ID" = "C"."BlocksetID"
        LEFT JOIN "BlocksetEntry" G
          ON "B"."ID" = "G"."BlocksetID"
        LEFT JOIN "Block" F 
          ON "G"."BlockID" = "F"."ID"  
        LEFT JOIN "BlocksetEntry" I
          ON "E"."ID" = "I"."BlocksetID"
        LEFT JOIN "Block" H 
          ON "I"."BlockID" = "H"."ID"
        WHERE 
          "A"."BlocksetId" >= 0 AND
          "D"."FilesetID" = 16677 AND
          ("I"."Index" = 0 OR "I"."Index" IS NULL) AND  
          ("G"."Index" = 0 OR "G"."Index" IS NULL)
        ) J
    LEFT OUTER JOIN 
        "BlocklistHash" K 
    ON 
        "K"."BlocksetID" = "J"."BlocksetID" 
    ORDER BY "J"."Path", "K"."Index"
    ) L

LEFT OUTER JOIN
    "BlocklistHash" M
ON
    "M"."BlocksetID" = "L"."MetablocksetID"
) UNION SELECT DISTINCT "Path" FROM (
SELECT
    "G"."BlocksetID",
    "G"."ID",
    "G"."Path",
    "G"."Length",
    "G"."FullHash",
    "G"."Lastmodified",
    "G"."FirstMetaBlockHash",
    "H"."Hash" AS "MetablocklistHash"
FROM
    (
    SELECT
        "B"."BlocksetID",
        "B"."ID",
        "B"."Path",
        "D"."Length",
        "D"."FullHash",
        "A"."Lastmodified",
        "F"."Hash" AS "FirstMetaBlockHash",
        "C"."BlocksetID" AS "MetaBlocksetID"
    FROM
        "FilesetEntry" A, 
        "File" B, 
        "Metadataset" C, 
        "Blockset" D,
        "BlocksetEntry" E,
        "Block" F
    WHERE 
        "A"."FileID" = "B"."ID" 
        AND "B"."MetadataID" = "C"."ID" 
        AND "C"."BlocksetID" = "D"."ID" 
        AND "E"."BlocksetID" = "C"."BlocksetID"
        AND "E"."BlockID" = "F"."ID"
        AND "E"."Index" = 0
        AND ("B"."BlocksetID" = -100 OR "B"."BlocksetID" = -200) 
        AND "A"."FilesetID" = 16677
    ) G
LEFT OUTER JOIN
   "BlocklistHash" H
ON
   "H"."BlocksetID" = "G"."MetaBlocksetID"
ORDER BY
   "G"."Path", "H"."Index"

))

This query appears near the start of each backup run. However, the time the query takes varies wildly, from as short as 4.9 seconds to as long as 1 minute and 6 seconds (with clusters around 5 seconds, 23 seconds, and 40 seconds).

SQLite temporary file usage

I attempted to capture the file system activity of both a “short” and a “long” run, in order to see if I could find some difference in the way they utilize temporary files during database operations. Unfortunately, without fail, every time I used ProcessActivityView or Process Monitor to try to monitor Duplicati, it always resulted in a “long” backup.

In case you’re interested, you can take a look at the “long” runs I did monitor here: PAV_Duplicati.zip (38.1 KB)

Running as admin

For fun, I did run Duplicati as an elevated process and turn on USN tracking. This did not resolve the issue.

ts678 · September 11, 2021, 2:56pm

Thanks for the heavy pursuit of the problem. I think it’s getting at least some leads that might be used.

I guess that’s what has been called a Heisenbug. Maybe there’s a less disruptive way to watch activity.

I’m still wondering if temporary files are a culprit. I see lots of files in your logs that begin with etilqs…
Temporary Files Used By SQLite says less than I’d like about telling why these are being created, but I expect one can sneak a peek (e.g. with the dir command) in their directory to get a rough tally without throwing off the behavior. There’s one section in SQLite’s page that sounds maybe size/load sensitive:

Other Temporary File Optimizations

This means that for many common cases where the temporary tables and indices are small (small enough to fit into the page cache) no temporary files are created and no disk I/O occurs. Only when the temporary data becomes too large to fit in RAM does the information spill to disk.

I’m no SQLite expert (any out there?), but it sounds like changing the number of cached pages is tough, when one wants to try a spur-of-the-moment last-shot experiment before discarding current backup, but changing the page size might be possible without code change using DB Browser for SQLite to use the Edit Pragmas tab to raise Page Size from 4096 to 8192, then use Save button at window bottom right.

PRAGMA page_size seems to say it’s not immediately effective, but you can at least see if it sticks over close and reopen of the job Database. Eventually, you probably need a vacuum, either by Duplicati or by Execute SQL tab with vacuum in it. After that, see if it helps speed. It might destroy backup, but that may happen anyway if you need speed and plan to start over. If you have space, you could copy backup files (probably quite a lot of space) or the database (small, but also in theory can be rebuilt from backup files).

The experiment I’m proposing would certainly be worth trying before the complete start-over of the backup because even if it blows up there’s not really any additional loss. Any damage seems less likely to backup remote files than the database, so it might also be tried before database Recreate test. You could even try several things in parallel if you can spare the storage, e.g. get larger block size one going, then play more.

In support of the something-overloaded theory, I’ll mention that some performance seems to go into sharp slowdown past a certain number of blocks, and in one test I think I saw etilqs files getting hyperactive…

github.com

duplicati/duplicati/blob/017911eb3debc39ab6026370b8fcfafc67ef6937/Tools/Commandline/RestoreFromPython/restore_from_python.py#L197-L198


      
          cursor.execute("PRAGMA page_size = 16384")
          cursor.execute("PRAGMA cache_size = 1000")

seems like it bumps both tunings, but I’m not sure you can get that to change and stick in regular backup.

amloessb · September 12, 2021, 1:57am

I’ll give that a shot before I rebuild. Will let you know how it goes.

Eni_Ki · September 16, 2021, 2:05pm

just for the record: I reported a similar problem before, still waiting for it to occur again so I can do more investigations: Job suddenly takes 10x longer than usual

ts678 · September 16, 2021, 5:47pm

Thanks. This is the kind of help that’s needed from people who run into issues but can help examine them. Development frequently lacks time and equipment configurations, so hard-to-reproduce bugs are … tough.

This isn’t the only area where performance could be examined and (ideally) slowdowns identified and fixed.
Volunteers would be valuable in this (and all) areas, and database skills might be especially valuable to get.

amloessb · September 16, 2021, 6:57pm

It looks like changing the page_size pragma from 4096 to 8192 (followed by a vacuum) not only sticks, but has a dramatic effect on backup duration. Ever since I made the change on the 12th, all of my backups have had durations of less than one hour.

backup list (screenshot)

Curiously, although some of my backups are now taking much less time than they were prior to encountering this issue (as short as 12 minutes), they still sometimes take 30-45 minutes (not due to compacting). This indicates to me that changing the page size acted as an across-the-board performance improvement, but doesn’t address the problem of different backup runs for the same set taking wildly different amounts of time to complete.

With this improvement, I don’t really have a need to rebuild the database for speed purposes, but if you’d still like me to do so for science, let me know.

drwtsn32 · September 16, 2021, 7:28pm

Interesting… I’m going to try it on one of my databases to see if I can see the same thing.

ts678 · September 17, 2021, 7:18pm

If anybody is up to looking at etilqs temporary file activity difference, that might explain the speedup.
My original guess was that the fixed-size page cache overflowed into temporary files more at 4096…

ts678 · September 21, 2021, 4:54pm

Maybe check load (CPU, disk, etc.) with Task Manager, Resource Monitor, Performance Monitor.

Task Manager is very simple, but you have to look live. Performance Monitor can keep history in
log files (seems a bit tricky to set up), or in a “live” graph with customizable interval and duration.

If you would rather have Duplicati log conditions with a script when a run occurs, you can find the
performance counter name in Performance Monitor then use it in typeperf or maybe Get-Counter.

There might be other tools that are easier, yet keep enough history. I couldn’t quickly find anything.

amloessb · September 22, 2021, 11:41pm

Here’s two ProcessActivityView snapshots (18.9 KB) I took during backups with the larger page size. The number of etilqs files being utilized doesn’t seem to have changed from before I modified the page size, but feel free to do some more analysis on them if you’d like.

I didn’t really go into this in my previous summaries, but I did suspect that some form of resource starvation might be to blame for the increased execution time. To test, during a time where backups were taking an extremely long time to complete, I closed almost everything else I had running on my PC, then let Duplicati run a backup. Unfortunately, it still took forever to complete, so it’s at least not only lack of resources playing a factor.