Yes, right now both conditions have to be satisfied. I think putting a percentage threshold on the total wasted space cannot scale properly as the backup grows over time.
You don’t even need a low threshold for this. A high threshold is even worse, because it delays compacting until a significant portion of the entire backup needs to be compacted.
After every backup operation there is a chance of a small leftover, so even if you were to compact after every backup this would not change much overall.
Maybe there could also be a smarter selection process, where enough volumes are picked so that the result fits in a whole number of volumes. For example, assuming there are 5 volumes with 25% waste, you could compact 4 of them into 3 new ones and leave the 5th for next time. But that would require a bit more design and testing, as probably this won’t work out so evenly in real cases.
Also having a low threshold as @ts678 suggested, simply creates insane amount of backup volume churn. I wouldn’t call that “improving efficiency”. As far as I can see, it just makes whole process a lot less efficient wasting all resources, except destination storage.
I’ve also asked for compaction time limit option. Currently the compaction can run for several days. Of course currently you can run the compact with timeout, which simply kills the process after N seconds of compacting.
Did you see any other change requests or useful discussions beyond my cited 2 in forum and 1 issue?
If low threshold causes churn and high threshold wastes storage, maybe an efficient way to compact under a time limit using the current algorithm (which applies the threshold to both total and volumes) would be to compact volumes in something like descending order of space wasted in specific volume.
Under a changed algorithm which looks only at volume waste, this would be sort of a natural outcome.
Update observation under Version 2.1.0.2:
I realized, that the waiting time from start to log the first line has been disappeared! With the former Version it lasted approx. 1 minute until the first action occured. This means the pure backup runs much faster now!
In the backup batch, I switched off compacting, to prevent a long suprising waiting time. Therefore today I used the separate compacting command from the GUI. The threshold parameter is a global one I assume. Next time I’ll use the compact command on the CLI an play around with the threshold parameter.
It shows between 12:03:02 and 12:25:30 no log entry. Perhaps the verbosity level is not high enough. In this time I saw in the resource monitor, that a 12G ZIP file (without an extension) in the TMP folder has been processed and copied.
Between 12:34:29 and 13:12:28 the main activity was operating with the SQLite-DB. 45 minutes are really long for “only” DB-processing.
This compacting operation was the longest since using Dulpicati: 1:12h!
I hoped a little bit, that the new version had got perfomance enhancements for the compacting process.
Onurbi
the preliminary work of determining which files to download and compact took up a considerably amount of time
This backup might be misconfigured. What’s the Options screen Remote volume size?
Refer to help link for that in the GUI. Advanced option dblock-size has a similar effect.
2024-12-29 12:00:05 +01 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 29 small volumes and the maximum is 20
2024-12-29 12:00:05 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
2024-12-29 12:00:05 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (69 Bytes)
2024-12-29 12:00:05 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b0b33806d8450427693296219a206d4b9.dblock.zip (12.56 GB)
2024-12-29 12:03:00 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b0b33806d8450427693296219a206d4b9.dblock.zip (12.56 GB)
means that at some past time, you had remote volume size set 12 about GB high or higher.
says you still do. We discussed blocksize above. Is yours still 100 KB? That’s a lot of blocks, probably stressing the SQL and its cache (discussed too – are you now using a higher value?).
I would say so. That’s 2000 times the default, and maybe causes long infrequent compacts.
I was wondering if the 12 GB file was in the 29 small files downloaded. Now more plausible:
C:\Duplicati\duplicati-2.1.0.2_beta_2024-11-29-win-x64-gui>Duplicati.CommandLine help small-file-size
--small-file-size (Size): Volume size threshold
When examining the size of a volume in consideration for compacting, a small tolerance value is used, by default 20 percent of the volume size. This ensures that large volumes which may have a few bytes
wasted space are not downloaded and rewritten.
C:\Duplicati\duplicati-2.1.0.2_beta_2024-11-29-win-x64-gui>Duplicati.CommandLine help small-file-max-count
--small-file-max-count (Integer): Maximum number of small volumes
To avoid filling the remote storage with small files, this value can force grouping small files. The small volumes will always be combined when they can fill an entire volume.
* default value: 20
With 100 GB remote volume size, 20 GB is where a file is considered small, per above help.
Remote volume size is new manual’s warning about the impact on restore. Old manual has:
Work will probably at least be more spread out. Sometimes SQL speed also degrades more than linearly with size, due both to algorithms and to overflowing its memory cache as I had described.
A full analysis would need a lot heavier logging, and possibly detailed analysis as in linked article. Developer would probably have to lead you through what’s needed, if they wish to pursue further.
EDIT:
SQL is always by C library, so .NET 8 speedup may be minimal. Usual way to find slow SQL is to log at profiling level to see if there are any individual ones that can be sped up, e.g. with indexes.
What’s more exotic is to look at reason why queries are slow. Sometimes one can count program file (database, database rollback log, database etilqs temporary file) uses in Process Explorer, or detail it in Process Monitor. Looking at drive-level activity is not enough, due to Windows caching.