How long to compact, and why no logging? Is it stuck?

Agreed, but there must be some other variable at play, as many have adjusted retention policies without issue. Would be interesting to see if we could reproduce this on demand.

Out of curiosity have you customized your deduplication block size (default is 100KB)? Your backups are quite large, and the default block size of 100KB is probably too small. This results in a large local database and slow database operations.

Re: deduplication block size - no, I have not changed this

This morning the problem has reappeared. On both systems the last entries are similar to:

  • Jul 29, 2021 2:59 AM: RemoteOperationDelete took 0:00:00:00.154
  • Jul 29, 2021 2:59 AM: Backend event: Delete - Completed: duplicati-ie95c20328b0a41e8bbcd9658354aa4a0.dindex.zip.aes (448.65 KB)

and the mono-sgen process running Duplicati.Server.exe is consuming 100%+ CPU.

What version of mono are you using? Also, how large are your job sqlite databases?

Might as well take note of the modification and last access (ls -lu --full-time) times while there.

What version of mono are you using?

Installed Packages
mono-complete.x86_64 6.8.0-8.fc33 @updates
mono-core.x86_64 6.8.0-8.fc33 @updates
mono-data.x86_64 6.8.0-8.fc33 @updates
mono-data-oracle.x86_64 6.8.0-8.fc33 @updates
mono-data-sqlite.x86_64 6.8.0-8.fc33 @updates
mono-devel.x86_64 6.8.0-8.fc33 @updates
mono-extras.x86_64 6.8.0-8.fc33 @updates
mono-locale-extras.x86_64 6.8.0-8.fc33 @updates
mono-mvc.x86_64 6.8.0-8.fc33 @updates
mono-reactive.x86_64 6.8.0-8.fc33 @updates
mono-wcf.x86_64 6.8.0-8.fc33 @updates
mono-web.x86_64 6.8.0-8.fc33 @updates
mono-winforms.x86_64 6.8.0-8.fc33 @updates
mono-winfx.x86_64 6.8.0-8.fc33 @updates
monodoc.x86_64 6.8.0-8.fc33 @updates

These are the current versions for Fedora 33 (dnf does not show any available updates).

Also, how large are your job sqlite databases?

System 1:

ls -lu --full-time *.sqlite

-rw-------. 1 root root 233472 2021-07-29 10:20:00.149070786 -0400 Duplicati-server.sqlite
-rw-------. 1 root root 18098118656 2021-07-29 02:34:19.850664721 -0400 MYFGVUULA.sqlite

System 2:

ls -lu --full-time *.sqlite

-rw-------. 1 root root 233472 2021-07-29 10:20:00.149070786 -0400 Duplicati-server.sqlite
-rw-------. 1 root root 18098118656 2021-07-29 02:34:19.850664721 -0400 MYFGVUULA.sqlite

On the off chance that it is relevant, in the past I had to move the Duplicati directory to a different filesystem, and replaced /root/.config/Duplicati with a symlink. Systems have been running with this configuration for several weeks. Also, due to issues with disk space doing a rebuild, I have TMPDIR set in /usr/bin/duplicati-server to point to a temp direcotry on a filesystem that has more space - this is a more recent change.

On one of the systems I did a database “Delete and Recreate”. Took all day, but it completed with the following warnings/errors:

Recreate Database Phase Warnings 2

2021-07-30 21:07:13 -04 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes
2021-07-30 21:40:02 -04 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes

Errors 1

2021-07-30 12:19:06 -04 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-MissingFileDetected]: Remote file referenced as duplicati-b96665dae446b4aa4aea59d9e4824919e.dblock.zip.aes by duplicati-i4645a522a80b4bbca434120271ca8c4f.dindex.zip.aes, but not found in list, registering a missing remote file

Will now go ahead and attempt a backup (it would attempt one at 2am anyway).

If you’re unsure if compaction is stuck, it’s very easy to check the temp folder and see if there’s “high traffic”. There should be files coming and going all the time, as well as network traffic should be bit more inbound than out bound (of course depending on volumes and data distribution you’re deleting.) But that’s the rough rule. That’s what I’ve used a few times to just quick check if it’s stuck or not.

Re: checking the temp folder - there was no activity in the temp folder during the times I suspected things were stuck (I checked). Also, given the more recent hangs, I am now thinking that it was not doing compaction at the time.

Thanks for the suggestion!

The backup I manually started completed as did the regularly scheduled 2am backup. However, neither backup appears to have deleted any volumes, and so far that appears to be related - so far backups that don’t delete volumes complete, backups that do delete volumes get stuck doing the deletions (or maybe after the last delete).

Although it’s not supposed to get stuck anywhere, narrowing down where it gets stuck might help.

log-file=<path> and log-file-log-level=<level> is easier for long logs than trying to look over live log.
Starter level of Retry might be reasonable. Possibly this will wind up at Profiling, which can be big.

If you have the less command, I think it can handle big logs and even do tail -f if you want live view.

I have a Profiling log which I’m cutting down to Information level here. There are clues on positions.

2021-05-22 11:42:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)
2021-05-22 11:42:49 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed

is end of backup itself (dlist says what’s in it), and retention decisions which may delete and compact:

2021-05-22 11:42:54 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-20210521T154106Z.dlist.zip.aes (51.62 KB)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-20210521T154106Z.dlist.zip.aes (51.62 KB)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteResults]: Deleted 1 remote fileset(s)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.47% wasted space and the limit is 25%
2021-05-22 11:42:57 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b7baa95430b9245e0a18dcb829b5bd62b.dblock.zip.aes (41.35 MB)

A backup might not delete, and a delete might not need a compact (Compact decision is logged anyway).
I show the first Get, but there were more, then a Put of the compacted file, then Delete of the original files.
Eventually the Compact finished and there is a final List to check destination, and a sample verify of 1 set:

2021-05-22 11:55:46 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
2021-05-22 11:55:47 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (178 bytes)
2021-05-22 11:55:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)
2021-05-22 11:55:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)

So that’s how it’s supposed to look at a fairly light log level. If you did Profiling, it adds mostly SQL which is sometimes relevant, but we seem to still be looking for the neighborhood of hang, so lighter start is easier.

Another way to try to find the neighborhood is to separate the deletes from the compact. no-auto-compact will let the deletes finish. If they finish, backup should be as done as it gets, should leave you a job log, etc.

You can then use the Compact now item on the backup menu to try that after you’ve set up the logs again.

I’m not sure how long this problem will be reproducible. Were you previously limited backup versions any? Were you hitting the limit and getting version deletes and compact? Smart retention is just a different way of deciding which old versions to delete, however it’s probably more likely to do a larger number at a time, compared to, say, date-based or version-count-based retentions which (once set) might do one at a time.

If you’re willing, you could go to heavy (Profiling) logs from the start, and use the Information messages as guideposts if need be. If it gets stuck, then looking at the end of the log and doing reverse search may help.

Alternatively, start with lighter logs and see if the problem sticks around in case Profiling log is needed later.

I think I’ve eliminated compact as being part of the issue, as I now have no-auto-compact set, and I’m still seeing the issue. System #2 ran into it this morning, and that was after a database delete/recreate (not just a repair).

So far the symptoms that seem to be consistent:

  • hangs after/during “delete files”
  • have to stop/start duplicati
  • next backup fails indicating there are missing files on remote and a repair is required
  • problem occurs on 2 different systems running Fedora after changing the retention schedule
  • IIRC problem was also occurring on a Windows system where the retention schedule was also changed, but on that system I was comfortable deleting more recent versions (it currenty only has 17). That system appears to be backing up without issue.

I will check into the log file settings as suggested.

An error on my part - I had set no-auto-compact at one point, but apparently after the backups appeared to be running again, I removed it. So compact might still be in play.

On system #2 the repair has completed, and I am attempting a backup (with the log-file option set as suggested). If it completes, I may trigger a manual compact to see what happens.

Initial data from manually running a compact…

2021-08-01 10:35:13 -04 - [Information-Duplicati.Library.Main.Controller-Startin
gOperation]: The operation Compact has started
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-FullyDeletableCount]: Found 0 fully deletable volume(s)
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-SmallVolumeCount]: Found 97 small volumes(s) with a total size of 1.40 GB
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-WastedSpaceVolumes]: Found 38205 volume(s) with a total of 73.89% wasted
space (3.61 TB of 4.89 TB)
2021-08-01 10:39:18 -04 - [Information-Duplicati.Library.Main.Database.LocalDele
teDatabase-CompactReason]: Compacting because there is 73.89% wasted space and t
he limit is 25%
2021-08-01 10:39:19 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: List - Started: ()
2021-08-01 10:41:46 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: List - Completed: (109.54 KB)
2021-08-01 11:48:32 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: Get - Started: duplicati-b0002619f54ea447c8be0c63fb1a9a
b34.dblock.zip.aes (49.94 MB)

I assume with 73.89% wasted space the compact is going to take a bit of time… This leads me back to one of the questions I started with - is there any way to get a reasonable estimate as to how long the compact should take?

No - I never set an initial limit of backup versions.

I changed it to use the smart retention policy, but then ran into this hang (assuming it is in fact a hang). My original assumption was that it was just taking a long time to delete the old versions, so I then changed it to keep a limited number of days, reducing it by 40 or 50 at a time. Once it got down to a smaller # of versions, I then switched back to smart retention, and the hang reappeared.

It might. You might have trimmed off quite a few versions. You can see if home page Versions shrank.

It depends on things like download and upload speeds (are you bandwidth-limited?), CPU and disk, etc.

I also think (I tried looking once) that the wasted space number is an overall amount that triggers things.
Some dblock files might have very little wasted space, and these aren’t compacted. Worst case time would be to download, repackage, and reupload everything, but you’re not quite there. Maybe you could assume you were and start with that to ballpark time, as you won’t get the full speeds an Internet speed tester may show, because they run lots of threads in parallel. The compact operation is not parallelized.

Duplicati Profiling level logs show actual DownloadSpeed and UploadSpeed if you want to look at them. Smaller files go slower (but they’re small…), so transfer time is probably dominated by big dblock files.
But none of this should cause anything to stop. It should just do lots of download-upload, deletes cycles.

According to the log download speed is running between 7-9 MB/s, upload is all over the map (seems to be constrained by the size of the file) but should also be able to do 7-9 MB/s if the file is large enough.

CPU is i7. Disk is SSD.

At the moment that is exactly what I am seeing. This is different from the hang/stuck situation where it wasn’t logging anything. Also, top shows mono-sg* using more CPU time for the compact than for the hang/stuck situation - 100-300% for the compact vs. ~150% for the hang/stuck.

Another possible way to estimate completion time in a situation with so much compacting might be to look at destination timestamps on dblock files (or perhaps all files – dindex will change with associated dblock) sorted by date, to see how many look freshly produced by compact, versus how many are left to go. This lets you estimate rate and time. Unfortunately with the big backups it will take awhile but at least it’s going.

I’d worry about disaster recovery times if you have to pull down most of these backups for a total restore…

If you ever expect a big compact on a big backup, you can spread the delay by slowly reducing this option:

  --threshold (Integer): The maximum wasted space in percent
    As files are changed, some data stored at the remote destination may not
    be required. This option controls how much wasted space the destination
    can contain before being reclaimed. This value is a percentage used on
    each volume and the total storage.
    * default value: 25

Alternatively, Backup retention can begin as a custom value a bit less aggressive than default which is
1W:1D,4W:1W,12M:1M. I’m not sure how safe it is to kill Duplicati during compact, but if you dare to do so, this method might get you running backups again a little sooner. A fresh start may too (but lose old ones).

As long as I know that it is compacting and not hung/stuck, I am content to let it run. I am debating whether or not to let it auto-compact going forward (after I get past this issue).