Filesize on Backblaze B2 much bigger than on NAS

Hi Overthere,
I’ve got a storage problem with B2 for some time now. I’m storing the same source-files to my NAS and to B2 and the latter is using much more space than the first (see attachment) .

Settings are as follows:
NAS
RemoteVolSize: 100MB
Versions: 3
blocksize: 2MB

B2
RemoteVolSize: 100MB
Versions: 2
threshold: 100

Anybody got a clue regarding the difference? Can it be the threshold setting? Or can I do some manuale cleanup in any way?

Thx in advance
Ralf

I haven’t used the --threshold option myself and if my understanding of it is correct, by setting it to 100 this implies that your remote storage space will need to be 100% used before any space is reclaimed. How much total storage does B2 report?

If you did change 60-80GBs of data between the first and second backup to B2 (with threshold=100) then yes that seems like an expected behaviour. If you did not change that much data between those runs then something else is going on.

Based on the wording in the manual for threshold, you should be able to just specify a new value then run the backup job to have it take effect.

Can I ask why you are using the threshold option in the first place? I can only find a few references to people using it and they only seem to be setting it to 0 during --dblock-size changes.

1 Like

It’s certainly the top suspect, but compact depends on many things including your specific data behavior. Because the default 25% wasted space threshold isn’t hit all the time, you may also see usage fluctuate.

The COMPACT command and no-auto-compact show options for Compacting files at the backend whose goal is to reduce space usage at the expense of downloads and uploads, thus there can be cost tradeoffs. Backblaze B2, I believe, charges for storage and downloads but not for uploads. Other services may vary.

Does duplicati download any data?
BackBlaze is reporting large daily downloads when using Duplicati

A log-file, for example at Information level or better, gives reports on why Compact does or doesn’t run.
Viewing the log files of a backup job will do if you prefer clicking. Look for the Compact section for data.

EDIT:

You have to go up from Information to Verbose level if you want all the numbers in the log file. Example:

2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-FullyDeletableCount]: Found 0 fully deletable volume(s)
2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-SmallVolumeCount]: Found 3 small volumes(s) with a total size of 17.44 MB
2020-03-03 12:42:05 -05 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-WastedSpaceVolumes]: Found 17 volume(s) with a total of 25.24% wasted space (581.57 MB of 2.25 GB)
2020-03-03 12:42:05 -05 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.24% wasted space and the limit is 25%

Thanks a lot and yes the threshold setting was chosen after the problem with heavy downloading when compacting, and because kenkendk mentioned it on Does duplicati download any data? .

If I understand the other thread correctly another suggestion was to lower the threshold below 25% so compacting would run more often with less downloading. Is that correct? Than I’ll give that a try.

You could try that, and perhaps the 1 GB of free daily download will cut costs, but it’s unpredictable, especially without an extremely detailed knowledge of your data (and how it changes) and retention.

Lowering the threshold to an extreme degree can cause constant compacting, and if not at that then threshold also affects (as documented) the per-dblock decision, and that could avalanche download.

  --threshold (Integer): The maximum wasted space in percent
    As files are changed, some data stored at the remote destination may not
    be required. This option controls how much wasted space the destination
    can contain before being reclaimed. This value is a percentage used on
    each volume and the total storage.
    * default value: 25

If you’re going to try to optimize this, both the standard log and the log file can probably assist you…

Is B2 keeping versions?
My understanding is that recommendation is to set

File Lifecycle: Keep only the last version 

This is done because duplicati maintains the versions (it does not utilize the versions within B2).

Thanks for the reminder, but yes it’s already set to one version.