How long to compact, and why no logging? Is it stuck?

Although it’s not supposed to get stuck anywhere, narrowing down where it gets stuck might help.

log-file=<path> and log-file-log-level=<level> is easier for long logs than trying to look over live log.
Starter level of Retry might be reasonable. Possibly this will wind up at Profiling, which can be big.

If you have the less command, I think it can handle big logs and even do tail -f if you want live view.

I have a Profiling log which I’m cutting down to Information level here. There are clues on positions.

2021-05-22 11:42:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)
2021-05-22 11:42:49 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed

is end of backup itself (dlist says what’s in it), and retention decisions which may delete and compact:

2021-05-22 11:42:54 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-20210521T154106Z.dlist.zip.aes (51.62 KB)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-20210521T154106Z.dlist.zip.aes (51.62 KB)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteResults]: Deleted 1 remote fileset(s)
2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.47% wasted space and the limit is 25%
2021-05-22 11:42:57 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b7baa95430b9245e0a18dcb829b5bd62b.dblock.zip.aes (41.35 MB)

A backup might not delete, and a delete might not need a compact (Compact decision is logged anyway).
I show the first Get, but there were more, then a Put of the compacted file, then Delete of the original files.
Eventually the Compact finished and there is a final List to check destination, and a sample verify of 1 set:

2021-05-22 11:55:46 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
2021-05-22 11:55:47 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (178 bytes)
2021-05-22 11:55:49 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)
2021-05-22 11:55:51 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-20210522T154001Z.dlist.zip.aes (51.58 KB)

So that’s how it’s supposed to look at a fairly light log level. If you did Profiling, it adds mostly SQL which is sometimes relevant, but we seem to still be looking for the neighborhood of hang, so lighter start is easier.

Another way to try to find the neighborhood is to separate the deletes from the compact. no-auto-compact will let the deletes finish. If they finish, backup should be as done as it gets, should leave you a job log, etc.

You can then use the Compact now item on the backup menu to try that after you’ve set up the logs again.

I’m not sure how long this problem will be reproducible. Were you previously limited backup versions any? Were you hitting the limit and getting version deletes and compact? Smart retention is just a different way of deciding which old versions to delete, however it’s probably more likely to do a larger number at a time, compared to, say, date-based or version-count-based retentions which (once set) might do one at a time.

If you’re willing, you could go to heavy (Profiling) logs from the start, and use the Information messages as guideposts if need be. If it gets stuck, then looking at the end of the log and doing reverse search may help.

Alternatively, start with lighter logs and see if the problem sticks around in case Profiling log is needed later.

I think I’ve eliminated compact as being part of the issue, as I now have no-auto-compact set, and I’m still seeing the issue. System #2 ran into it this morning, and that was after a database delete/recreate (not just a repair).

So far the symptoms that seem to be consistent:

  • hangs after/during “delete files”
  • have to stop/start duplicati
  • next backup fails indicating there are missing files on remote and a repair is required
  • problem occurs on 2 different systems running Fedora after changing the retention schedule
  • IIRC problem was also occurring on a Windows system where the retention schedule was also changed, but on that system I was comfortable deleting more recent versions (it currenty only has 17). That system appears to be backing up without issue.

I will check into the log file settings as suggested.

An error on my part - I had set no-auto-compact at one point, but apparently after the backups appeared to be running again, I removed it. So compact might still be in play.

On system #2 the repair has completed, and I am attempting a backup (with the log-file option set as suggested). If it completes, I may trigger a manual compact to see what happens.

Initial data from manually running a compact…

2021-08-01 10:35:13 -04 - [Information-Duplicati.Library.Main.Controller-Startin
gOperation]: The operation Compact has started
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-FullyDeletableCount]: Found 0 fully deletable volume(s)
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-SmallVolumeCount]: Found 97 small volumes(s) with a total size of 1.40 GB
2021-08-01 10:39:18 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDa
tabase-WastedSpaceVolumes]: Found 38205 volume(s) with a total of 73.89% wasted
space (3.61 TB of 4.89 TB)
2021-08-01 10:39:18 -04 - [Information-Duplicati.Library.Main.Database.LocalDele
teDatabase-CompactReason]: Compacting because there is 73.89% wasted space and t
he limit is 25%
2021-08-01 10:39:19 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: List - Started: ()
2021-08-01 10:41:46 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: List - Completed: (109.54 KB)
2021-08-01 11:48:32 -04 - [Information-Duplicati.Library.Main.BasicResults-Backe
ndEvent]: Backend event: Get - Started: duplicati-b0002619f54ea447c8be0c63fb1a9a
b34.dblock.zip.aes (49.94 MB)

I assume with 73.89% wasted space the compact is going to take a bit of time… This leads me back to one of the questions I started with - is there any way to get a reasonable estimate as to how long the compact should take?

No - I never set an initial limit of backup versions.

I changed it to use the smart retention policy, but then ran into this hang (assuming it is in fact a hang). My original assumption was that it was just taking a long time to delete the old versions, so I then changed it to keep a limited number of days, reducing it by 40 or 50 at a time. Once it got down to a smaller # of versions, I then switched back to smart retention, and the hang reappeared.

It might. You might have trimmed off quite a few versions. You can see if home page Versions shrank.

It depends on things like download and upload speeds (are you bandwidth-limited?), CPU and disk, etc.

I also think (I tried looking once) that the wasted space number is an overall amount that triggers things.
Some dblock files might have very little wasted space, and these aren’t compacted. Worst case time would be to download, repackage, and reupload everything, but you’re not quite there. Maybe you could assume you were and start with that to ballpark time, as you won’t get the full speeds an Internet speed tester may show, because they run lots of threads in parallel. The compact operation is not parallelized.

Duplicati Profiling level logs show actual DownloadSpeed and UploadSpeed if you want to look at them. Smaller files go slower (but they’re small…), so transfer time is probably dominated by big dblock files.
But none of this should cause anything to stop. It should just do lots of download-upload, deletes cycles.

According to the log download speed is running between 7-9 MB/s, upload is all over the map (seems to be constrained by the size of the file) but should also be able to do 7-9 MB/s if the file is large enough.

CPU is i7. Disk is SSD.

At the moment that is exactly what I am seeing. This is different from the hang/stuck situation where it wasn’t logging anything. Also, top shows mono-sg* using more CPU time for the compact than for the hang/stuck situation - 100-300% for the compact vs. ~150% for the hang/stuck.

Another possible way to estimate completion time in a situation with so much compacting might be to look at destination timestamps on dblock files (or perhaps all files – dindex will change with associated dblock) sorted by date, to see how many look freshly produced by compact, versus how many are left to go. This lets you estimate rate and time. Unfortunately with the big backups it will take awhile but at least it’s going.

I’d worry about disaster recovery times if you have to pull down most of these backups for a total restore…

If you ever expect a big compact on a big backup, you can spread the delay by slowly reducing this option:

  --threshold (Integer): The maximum wasted space in percent
    As files are changed, some data stored at the remote destination may not
    be required. This option controls how much wasted space the destination
    can contain before being reclaimed. This value is a percentage used on
    each volume and the total storage.
    * default value: 25

Alternatively, Backup retention can begin as a custom value a bit less aggressive than default which is
1W:1D,4W:1W,12M:1M. I’m not sure how safe it is to kill Duplicati during compact, but if you dare to do so, this method might get you running backups again a little sooner. A fresh start may too (but lose old ones).

As long as I know that it is compacting and not hung/stuck, I am content to let it run. I am debating whether or not to let it auto-compact going forward (after I get past this issue).

If I ever had to do a total restore I can prioritize things - get the more important files back quickly, and then the less important files can take however long it takes.

-rw-------. 1 root root 233472 2021-07-29 10:20:00.149070786 -0400 Duplicati-server.sqlite
-rw-------. 1 root root 18098118656 2021-07-29 02:34:19.850664721 -0400 MYFGVUULA.sqlite

This is quite a large database. You might want to try a VACUUM operation to see if that helps. Make sure that you have at least 2x the database size in free space in your temporary directory before doing so.

I think it’s large due to the default deduplication block size being used with a multi-terabyte backup.

@jberry02 it may be a big ask, but are you possibly willing to start over? Increasing the deduplication block size on System1 (5TB source) to 5MB will help reduce database size and increase performance of database related operations, at the expense of deduplication efficiency. You could use a similar block size on System2, as well.

That being said, I still don’t think Duplicati should fail/lock up forever with the default 100KiB block size. It should still be successful in all operations, even though they may take longer. So if you want to continue troubleshooting it from that angle I totally understand.

I am not very receptive to that idea :rofl: Would there be a way to “start over” while de-duplicating off the data that is already there? Otherwise I would end up doubling the space used on B2, and paying twice for the same data (and as I recall the initial backup took 3+ weeks). I would be more willing to reduce the number of backups - possibly down to 1 - by trimming a few versions at a time.

Another great idea that I’m not too keen on, as I know there is duplication, and I would expect a larger de-duplication block size to have a negative impact - at 5MB it would effectively be no de-duplication (at least within a single backup - multiple versions should still de-dupe nicely I would hope).

At the moment system 1 is still re-creating the database. Based on progress so far, I expect it will take another day or two. I’m assuming a re-created database won’t benefit from a vacuum, but if y’all think it would help, I’m willing to try. After that I think it devolves to deleting versions and/or doing a compact.

An interesting data point. This morning the live log for system 1 had not been updated since sometime last night. But when I re-invoked it, it was in fact still going and logging.

Shot in the dark - rather than an issue with deleting/compacting/whatever, could this just be a bug with logging what it is doing? I.e., maybe it was in fact doing a compact but just not showing anything via the live log?

Now that log shows activity, maybe you can see if temp folder is now active, unlike in previous check:

You can also try the database ls -lu --full-time test (and add ls -l --full-time as well) again.
Also compare those to current system time. Earlier post also showed system 1 and system 2 identical which might just be a posting error. I’m not positive times get updated before the file is closed though…

Looking at remote file dates is likely still an option, though it looks like B2 web UI won’t do a sort by date.
You could maybe find a better client. Cyberduck does B2. I don’t know for certain that it can date-sort…

There is activity in the temp folder as well as sqlite journal file, etc.

So during the hang we have:

  • 100+% CPU utilization for mono-sg*
  • no activity in “live log”
  • no activity in the temp folder
  • I think there was also no activity for the duplicati database folder, but I’m not as sure

System #2 is configured with --log-file, so if it hangs again, I can compare that log to the “live” log. I will be adding --log-file to System #1 as well once it finishes the database recreate.

What is the RAM and CPU configuration on these hosts?

Yeah, I understand… sometimes people don’t care about the history and don’t mind starting over. Thought I’d ask! Unfortunately you cannot change the deduplication block size without starting a new backup. No way to process the data already on the back end.

System 1 - AMD Ryzen 5 2400G with Radeon Vega Graphics, 12 GB
System 2 - Intel(R) Core™ i7-9750H CPU @ 2.60GHz, 32 GB