How long to compact, and why no logging? Is it stuck?

According to the log download speed is running between 7-9 MB/s, upload is all over the map (seems to be constrained by the size of the file) but should also be able to do 7-9 MB/s if the file is large enough.

CPU is i7. Disk is SSD.

At the moment that is exactly what I am seeing. This is different from the hang/stuck situation where it wasn’t logging anything. Also, top shows mono-sg* using more CPU time for the compact than for the hang/stuck situation - 100-300% for the compact vs. ~150% for the hang/stuck.

Another possible way to estimate completion time in a situation with so much compacting might be to look at destination timestamps on dblock files (or perhaps all files – dindex will change with associated dblock) sorted by date, to see how many look freshly produced by compact, versus how many are left to go. This lets you estimate rate and time. Unfortunately with the big backups it will take awhile but at least it’s going.

I’d worry about disaster recovery times if you have to pull down most of these backups for a total restore…

If you ever expect a big compact on a big backup, you can spread the delay by slowly reducing this option:

  --threshold (Integer): The maximum wasted space in percent
    As files are changed, some data stored at the remote destination may not
    be required. This option controls how much wasted space the destination
    can contain before being reclaimed. This value is a percentage used on
    each volume and the total storage.
    * default value: 25

Alternatively, Backup retention can begin as a custom value a bit less aggressive than default which is
1W:1D,4W:1W,12M:1M. I’m not sure how safe it is to kill Duplicati during compact, but if you dare to do so, this method might get you running backups again a little sooner. A fresh start may too (but lose old ones).

As long as I know that it is compacting and not hung/stuck, I am content to let it run. I am debating whether or not to let it auto-compact going forward (after I get past this issue).

If I ever had to do a total restore I can prioritize things - get the more important files back quickly, and then the less important files can take however long it takes.

-rw-------. 1 root root 233472 2021-07-29 10:20:00.149070786 -0400 Duplicati-server.sqlite
-rw-------. 1 root root 18098118656 2021-07-29 02:34:19.850664721 -0400 MYFGVUULA.sqlite

This is quite a large database. You might want to try a VACUUM operation to see if that helps. Make sure that you have at least 2x the database size in free space in your temporary directory before doing so.

https://www.tutorialspoint.com/sqlite/sqlite_vacuum.htm
https://duplicati.readthedocs.io/en/latest/06-advanced-options/#auto-vacuum

I think it’s large due to the default deduplication block size being used with a multi-terabyte backup.

@jberry02 it may be a big ask, but are you possibly willing to start over? Increasing the deduplication block size on System1 (5TB source) to 5MB will help reduce database size and increase performance of database related operations, at the expense of deduplication efficiency. You could use a similar block size on System2, as well.

That being said, I still don’t think Duplicati should fail/lock up forever with the default 100KiB block size. It should still be successful in all operations, even though they may take longer. So if you want to continue troubleshooting it from that angle I totally understand.

I am not very receptive to that idea :rofl: Would there be a way to “start over” while de-duplicating off the data that is already there? Otherwise I would end up doubling the space used on B2, and paying twice for the same data (and as I recall the initial backup took 3+ weeks). I would be more willing to reduce the number of backups - possibly down to 1 - by trimming a few versions at a time.

Another great idea that I’m not too keen on, as I know there is duplication, and I would expect a larger de-duplication block size to have a negative impact - at 5MB it would effectively be no de-duplication (at least within a single backup - multiple versions should still de-dupe nicely I would hope).

At the moment system 1 is still re-creating the database. Based on progress so far, I expect it will take another day or two. I’m assuming a re-created database won’t benefit from a vacuum, but if y’all think it would help, I’m willing to try. After that I think it devolves to deleting versions and/or doing a compact.

An interesting data point. This morning the live log for system 1 had not been updated since sometime last night. But when I re-invoked it, it was in fact still going and logging.

Shot in the dark - rather than an issue with deleting/compacting/whatever, could this just be a bug with logging what it is doing? I.e., maybe it was in fact doing a compact but just not showing anything via the live log?

Now that log shows activity, maybe you can see if temp folder is now active, unlike in previous check:

You can also try the database ls -lu --full-time test (and add ls -l --full-time as well) again.
Also compare those to current system time. Earlier post also showed system 1 and system 2 identical which might just be a posting error. I’m not positive times get updated before the file is closed though…

Looking at remote file dates is likely still an option, though it looks like B2 web UI won’t do a sort by date.
You could maybe find a better client. Cyberduck does B2. I don’t know for certain that it can date-sort…

There is activity in the temp folder as well as sqlite journal file, etc.

So during the hang we have:

  • 100+% CPU utilization for mono-sg*
  • no activity in “live log”
  • no activity in the temp folder
  • I think there was also no activity for the duplicati database folder, but I’m not as sure

System #2 is configured with --log-file, so if it hangs again, I can compare that log to the “live” log. I will be adding --log-file to System #1 as well once it finishes the database recreate.

What is the RAM and CPU configuration on these hosts?

Yeah, I understand… sometimes people don’t care about the history and don’t mind starting over. Thought I’d ask! Unfortunately you cannot change the deduplication block size without starting a new backup. No way to process the data already on the back end.

System 1 - AMD Ryzen 5 2400G with Radeon Vega Graphics, 12 GB
System 2 - Intel(R) Core™ i7-9750H CPU @ 2.60GHz, 32 GB

Compact on system 2 just finished…

Last successful backup: Today at 11:07 AM (took 00:26:55)
Next scheduled run: Tomorrow at 2:00 AM
Source: 759.94 GB
Backup: 878.22 GB / 16 Versions

-rw-------. 1 root root 233472 Aug 5 11:14 Duplicati-server.sqlite
-rw-------. 1 root root 17183563776 Aug 5 11:07 UMYFGVUULA.sqlite

System 2 appears to be backing up OK now. And it has been deleting remote file sets.

My path out of the mess appears to have been re-creating the database and running a compact.

System 1 is still chugging along re-creating the database.

I am wondering - when the re-create is done, would be it better to compact, or first trim the number of versions down to 1 and then compact? I expect the compact to take several days either way, but was wondering if there was a way to save some time, or maybe even get a backup done without having to wait several weeks.

You can watch progress with About → Show log → Live → Verbose. What sort of files is it doing now?
I hope it’s not downloading dblock files because that’s going to be most of the 5 TB. Slow and costly…
Preferably it’s still processing dlist or dindex files (lots of block data due to the small default blocksize).

The more versions trimmed, the longer compact needs to remove the newly-freed up space. That may have been part of the original problem (which I wouldn’t want to happen again). I don’t know how far the troubled previous compact got, but it may be safer to finish any current compact before deleting further.

Ramping the compact threshold down slowly was suggested earlier. Or set no-auto-compact for awhile.

It sounded like you would wait for the recreate to finish, and I talked about taming/postponing compacts.

If you want something sooner than that, you can maybe script something together to copy files modified after the last Duplicati backup. Is there any way to sync or copy files NEWER than a given date? rclone forum topic gives one way that might need date math. Or you can set up cron to get age-based copying.
Having rclone consider some excess files is probably harmless. I think it only copies if files are different.

Ordinarily I’d worry about having only one version, but is that how you plan to use Duplicati in the future?

Its processing dindex files - “Aug 8, 2021 8:47 AM: Processing indexlist volume 67524 of 261195”

I have no-auto-compact set, so it won’t start a compact until I tell it to do so.

Current plan is:

  • let the re-create finish (not that I have a choice!)
  • perform a backup
  • manually start a compact
  • note that I will not be reducing versions to 1 since it would not help the compact

In the meantime, if I want any interim off-site backups of system 1 I would need to create a tarball of modified files and copy it to one of my other systems (and then Duplicati would back up the tarball).

Looks like the database re-create has stalled - no log entries since yesterday, no disk activity, and no real CPU usage.

The last log entries are:

  • Aug 25, 2021 7:24 AM: Backend event: Get - Started: duplicati-i6cf6a21dd2f84b709d1fdbaad0841d93.dindex.zip.aes (31.75 KB)

  • Aug 25, 2021 7:24 AM: Backend event: Get - Retrying: duplicati-i6cf6a21dd2f84b709d1fdbaad0841d93.dindex.zip.aes (31.75 KB)

  • Aug 25, 2021 7:24 AM: Operation Get with file duplicati-i6cf6a21dd2f84b709d1fdbaad0841d93.dindex.zip.aes attempt 2 of 5 failed with message: The operation has timed out.

If this hasn’t progressed by tomorrow, is it safe to restart duplicati? I.e. will it resume the re-create (I assume I have to tell it to do the re-create), or will it start it over from scratch?

image is likely what you would use, and it’s from scratch.
There’s also a Repair button, but I think it’s more for fixing smaller inconsistencies.

I wonder if this has any relationship to the seeming hang that followed it. I tried getting a B2 problem by disconnecting WiFi’s USB during a download, but couldn’t get this exact error (and also got no hangs).

It would also be interesting to know if these ever happen without going into a hang. This would be in the backup log listing, under Complete log as RetryAttempts, but the stats come from the job database, meaning Recreate will delete them. log-file=<path> with log-file-log-level=retry is a way to keep history.

What sort of OS and hardware is this on? There was seemingly some backup history, which is mostly uploading, but download problems aren’t a good thing because that could interfere with needed restore.

I suppose you could run Duplicati.CommandLine.BackendTester.exe to an unused B2 folder for awhile, giving it a URL based on Export As Command-line but with folder modified to one used just for this test.

We could consider trying to do network packet capture, but it’d be kind of hard to configure and stop at problem time. If there’s ample drive space and not a lot of other HTTPS traffic on system, that will help.

There’s a chance that using Backblaze B2 using Duplicati S3 support would work better, but that would mean moving the data to a different bucket, which maybe rclone could do remotely using B2 copy API.