Processing time for Compacting

Yes, right now both conditions have to be satisfied. I think putting a percentage threshold on the total wasted space cannot scale properly as the backup grows over time.

You don’t even need a low threshold for this. A high threshold is even worse, because it delays compacting until a significant portion of the entire backup needs to be compacted.

After every backup operation there is a chance of a small leftover, so even if you were to compact after every backup this would not change much overall.

Maybe there could also be a smarter selection process, where enough volumes are picked so that the result fits in a whole number of volumes. For example, assuming there are 5 volumes with 25% waste, you could compact 4 of them into 3 new ones and leave the 5th for next time. But that would require a bit more design and testing, as probably this won’t work out so evenly in real cases.

2 Likes

Also having a low threshold as @ts678 suggested, simply creates insane amount of backup volume churn. I wouldn’t call that “improving efficiency”. As far as I can see, it just makes whole process a lot less efficient wasting all resources, except destination storage.

I’ve also asked for compaction time limit option. Currently the compaction can run for several days. Of course currently you can run the compact with timeout, which simply kills the process after N seconds of compacting.

As in this?

Is that done externally, e.g. with Python subprocess (which is what I used for my timeout kill tester)?

Feature Request: Time Limit for Compaction also asks for this. I also found one GitHub issue asking

[Feature request] Show progress during compaction #3397

Did you see any other change requests or useful discussions beyond my cited 2 in forum and 1 issue?

If low threshold causes churn and high threshold wastes storage, maybe an efficient way to compact under a time limit using the current algorithm (which applies the threshold to both total and volumes) would be to compact volumes in something like descending order of space wasted in specific volume.

Under a changed algorithm which looks only at volume waste, this would be sort of a natural outcome.