Hi Overthere,
I’ve got a storage problem with B2 for some time now. I’m storing the same source-files to my NAS and to B2 and the latter is using much more space than the first (see attachment) .
Settings are as follows:
NAS
RemoteVolSize: 100MB
Versions: 3
blocksize: 2MB
I haven’t used the --threshold option myself and if my understanding of it is correct, by setting it to 100 this implies that your remote storage space will need to be 100% used before any space is reclaimed. How much total storage does B2 report?
If you did change 60-80GBs of data between the first and second backup to B2 (with threshold=100) then yes that seems like an expected behaviour. If you did not change that much data between those runs then something else is going on.
Based on the wording in the manual for threshold, you should be able to just specify a new value then run the backup job to have it take effect.
Can I ask why you are using the threshold option in the first place? I can only find a few references to people using it and they only seem to be setting it to 0 during --dblock-size changes.
It’s certainly the top suspect, but compact depends on many things including your specific data behavior. Because the default 25% wasted space threshold isn’t hit all the time, you may also see usage fluctuate.
The COMPACT command and no-auto-compact show options for Compacting files at the backend whose goal is to reduce space usage at the expense of downloads and uploads, thus there can be cost tradeoffs. Backblaze B2, I believe, charges for storage and downloads but not for uploads. Other services may vary.
A log-file, for example at Information level or better, gives reports on why Compact does or doesn’t run. Viewing the log files of a backup job will do if you prefer clicking. Look for the Compact section for data.
EDIT:
You have to go up from Information to Verbose level if you want all the numbers in the log file. Example:
2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-FullyDeletableCount]: Found 0 fully deletable volume(s)
2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-SmallVolumeCount]: Found 3 small volumes(s) with a total size of 17.44 MB
2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-WastedSpaceVolumes]: Found 17 volume(s) with a total of 25.24% wasted space (581.57 MB of 2.25 GB)
2020-03-03 12:42:05 -05 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.24% wasted space and the limit is 25%
Thanks a lot and yes the threshold setting was chosen after the problem with heavy downloading when compacting, and because kenkendk mentioned it on Does duplicati download any data? .
If I understand the other thread correctly another suggestion was to lower the threshold below 25% so compacting would run more often with less downloading. Is that correct? Than I’ll give that a try.
You could try that, and perhaps the 1 GB of free daily download will cut costs, but it’s unpredictable, especially without an extremely detailed knowledge of your data (and how it changes) and retention.
Lowering the threshold to an extreme degree can cause constant compacting, and if not at that then threshold also affects (as documented) the per-dblock decision, and that could avalanche download.
--threshold (Integer): The maximum wasted space in percent
As files are changed, some data stored at the remote destination may not
be required. This option controls how much wasted space the destination
can contain before being reclaimed. This value is a percentage used on
each volume and the total storage.
* default value: 25
If you’re going to try to optimize this, both the standard log and the log file can probably assist you…