Strategy to recover from RAM corrupted blocks

Hello,

I have been using Duplicati for quite a few years now and it was previously running on a machine where the RAM modules failed progressively.
Before I could notice it, quite a few things got corrupted and while I could recover most of it, I believe a good number of dblock/dindex files sent to a b2 bucket are corrupted as well.
Every night, the backup operation downloads a sample of dblock files and more often than not, it fails with this message:

Content has been tampered with, do not trust content: invalid HMAC

Now, I totally understand that this file is “toast” but I’m wondering about the strategy to apply here and have come up with the following options:

  1. Full backend verification
  2. Identify every day which file is corrupted, delete it from backend, list-broken-files, purge-broken-files
  3. Destroy both the bucket and the database, restart from scratch

The first option is the safest and easiest way but with 1.3TB on the backend, it’s gonna cost quite a bit of money (egress costs) without even being sure of the result
The second option is quite tedious and does not even guarantee that all corrupted files will be identified.
The last option means that I would lose all history but is the fastest approach.

Would you have any other suggestion?

First, did you notice their 2023 egress change? I suspect it was a response to competition.

Recover with Duplicati? If you did lots of that, it probably used up some of the free egress.

If the “without even being sure of the result” means it can’t see all issues, I agree with that. Backups from a bad system could contain errors at any level. Checking only checks some. Your verification message probably checks only the top level encryption, and file handling.

I’m not clear what “Full backend verification” means. It might be “all” on “test” command, so similar to post-backup verification, except all at once instead of hoping to notice over time.

One can add full-remote-verification (might be noisy) or full-block-verification.

This still leaves open the risk that memory changed early in some old source block backup.

If the tedious part is the “every day”, first option avoids that. You still have to do the repairs.

Losing history seems the wrong thing when you might still need to try restoring more files. Whether or not this matters depends on how good you feel about recovery efforts thus far.

One option available to many users who want to get going again fast while some mess still exists is to start a fresh backup while keeping the old one until it’s pretty sure it’s needless.

Backblaze charges by time I think, so a brief safeguard may be cheap. If it may get longer, downloading to some local storage may cost less. It will also let you freely do more testing, looking at how badly damaged it is. Repeated remote test will drive up the egress amount.

Restarting fresh reduces the risk that there are hidden errors, e.g. do you occasionally test that database recreates cleanly, or that Direct restore from backup files works well?

You might also save space and time. Depending on when backup was first run, blocksize might be smaller than current default. Or maybe you always set a custom size, which was advised at least by me for backups over 100 GB, to reduce processing time for the blocks.

Going the other way, if you really like your old versions (even ignoring recent corruptions), test results (which we don’t know yet…) might look good enough to not go with fresh start.

You can, of course, have both if you’re willing to store the damaged backup to keep history which may or may not be available (due to damage), but if you delete it, it’s definitely gone.

Oh, thanks for the reminder, it somehow escaped my mind.

No, I meant, recover parts of the entire system that were not backed up by Duplicati.

Yes, that’s it, ask the “verification” to work on all blocks instead of a sample.

That’s where I’m somewhat lacking in my backup procedure as I don’t have a real “test remote backup” policy which is somewhat stupid, I agree. Is there a how-to or a dedicate page in the documentation as to the “best practice” with duplicati regarding this?

My command lines are all using this option: --dblock-size=50mb which I added because my backups were huge and I read the suggestion in various posts here.

Considering the new '3 times storage free egress", I’ll try to do a full remote verification but first I’ll have to identify if it’s only downloading once or 6 times the same block if found corrupted. Right now, here is what I have at the end of the failing backup:

Downloading file duplicati-b760e8c40debf4343b8851d3513e47b80.dblock.zip.aes (49,904 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Downloading file duplicati-bfa9991e7a79b41e78bdbdfe5e178be5e.dblock.zip.aes (49,925 MiB) ...
Fatal error => Échec du déchiffrement des données (phrase de passe invalide ?) : Content has been tampered with, do not trust content: invalid HMAC

Maybe it’s just trying 7 files no matter what, with a retry count quite high on failing ones?

Anyway, thanks for your reply, it sheds a new light on the situation.

That’s a different setting, whose default is 50MB for whole volume and hasn’t changed.

Backup size parameters explains --blocksize which I’ve suggested, but default grew.

To be pedantic again, it’s checking a dblock file, and hasn’t yet peeked at blocks inside.
As for retries, you can dial it down, but I’m hoping test won’t just end like backup did.

  --number-of-retries (Integer): Number of times to retry a failed transmission
    If an upload or download fails, Duplicati will retry a number of times before failing. Use this to handle unstable
    network connections better.
    * default value: 5

If you don’t have a network error, but an actually damaged file, the retries are not helpful, however if you change the setting to 0, then a network error might prematurely stop work.

Ideally IMO a test command would report errors and keep going, and you can test that.

I just tried that and I’m a bit surprised by the results. First, I ran this command:

duplicati-cli test --backup-name="Full backup" --dbpath=/var/lib/duplicati/.config/Duplicati/Database.sqlite --server-datafolder=/var/lib/duplicati/.config/Duplicati --encryption-module=aes --compression-module=zip --dblock-size=50mb --passphrase="SomeWords" --disable-module=console-password-input --asynchronous-concurrent-upload-limit=10 --retention-policy="1W:1D,4W:1W,12M:1M" --number-of-retries=1 "b2://abcdef-server/?auth-username=abcdef013&auth-password=abcdef012345678"

This failed with this error message:

L'opération Test a échoué avec l'erreur : Detected 4 volumes with missing filesets => Detected 4 volumes with missing filesets

ErrorID: DatabaseInconsistency
Detected 4 volumes with missing filesets

I then ran the repair command with the exact same options as in the test command above and this gave me this:

  Listing remote folder ...
  Deleting file duplicati-b0191d7fb348d4e10b726bc85129fbf09.dblock.zip.aes  ...
  Deleting file duplicati-bfc391d40732745b78d7c82cd83b46d0e.dblock.zip.aes  ...
  Deleting file duplicati-b8c53d3bcd0c443dfb93c8c9e6eec4f5a.dblock.zip.aes  ...
  Deleting file duplicati-b37ee9a75b7154f3892b1d9693e5ae8f2.dblock.zip.aes  ...
  Deleting file duplicati-bc3d3adf84a114263957e872ba60643a9.dblock.zip.aes  ...
  Deleting file duplicati-be2600a314e764f31be363eccb07dab4a.dblock.zip.aes  ...
  Deleting file duplicati-bef88f0fd34504d41988f631c0d0ea57b.dblock.zip.aes  ...
  Deleting file duplicati-b2bdd8cb9290048908a95c816c227615d.dblock.zip.aes  ...
  Deleting file duplicati-b19b5dadcb7594448b42012bcc547b4f4.dblock.zip.aes  ...
  Deleting file duplicati-b311fbf79d781428f9fd2186412c90f33.dblock.zip.aes  ...
  Deleting file duplicati-b6153cc659a594b29b5d6d32f1d61f1d1.dblock.zip.aes  ...
  Deleting file duplicati-b4eb0b8e7bf2f4d4aad19adb5c93d73a9.dblock.zip.aes  ...
L'opération Repair a échoué avec l'erreur : The remote files are newer (01/05/2025 04:11:03) than the local database (01/05/2025 04:10:59), this is likely because the database is outdated. Consider deleting the local database and run the repair operation again. If this is expected, set the option "--repair-ignore-outdated-database"  => The remote files are newer (01/05/2025 04:11:03) than the local database (01/05/2025 04:10:59), this is likely because the database is outdated. Consider deleting the local database and run the repair operation again. If this is expected, set the option "--repair-ignore-outdated-database"

ErrorID: RemoteFilesNewerThanLocalDatabase
The remote files are newer (01/05/2025 04:11:03) than the local database (01/05/2025 04:10:59), this is likely because the database is outdated. Consider deleting the local database and run the repair operation again. If this is expected, set the option "--repair-ignore-outdated-database"

The “off by 4 seconds” issue is a bit strange and I was worried that using the --repair-ignore-outdated-database option would mask some other real errors. Would there be a way to specify an “error margin” for times instread of blindly ignoring all?

Anyway, I ran the test command again with the exact same options as the first time, and it gave me this:

  Listing remote folder ...
  Downloading file duplicati-20250501T021103Z.dlist.zip.aes (171,169 MiB) ...
  Downloading file duplicati-ie8a29475f2e24164b5a445459ebabf1d.dindex.zip.aes (138,747 KiB) ...
  Downloading file duplicati-bbda17d37b0dd4ab69391ef328d6f9e1a.dblock.zip.aes (49,903 MiB) ...
Examined 3 files and found no errors

As this looked nice, I added --full-result=true --full-remote-verification=true to the command line but it did not change a single thing it still downloaded and tested only 3 files.
Unsurprisingly, adding --backup-test-percentage=100 did not change a thing either.

It’s as if those options are ignored. Would their place in the command line have an impact? I added them right before the b2 URL

I think it’s a Canary bug which should be fixed properly. It’s good that you came across it.

Does this theory of retries possibly explain your case? Your job Complete log has stats:

"RetryAttempts": 2

is from the test I cited. It might just be a false alarm that can be overridden as shown.

This looks like another new Canary addition from almost two months ago. Issue says:

Added repair code that can handle cases where a filest or remote dlist entry is missing, and then recreates the contents.

and we’ll see if the dev has anything more to add or ask. It sounds like fixup is automatic.

I wonder if an “off by 4 seconds” message is related to “4 volumes with missing filesets”, which could happen if all the retries (named 1 second apart) somehow got on destination.

There’s supposed to be a cleanup of the uploads that got errors, deleting the failed ones.
You could see what you see on your destination around that time (dlist times are in UTC).

You might be misunderstanding The TEST command. You got 1 set (3 files), the default.
Did you try asking for more, or all?

EDIT:

CLI help test:

Usage: Duplicati.CommandLine.exe test <storage-URL> <samples> [<options>]

  Verifies integrity of a backup. A random sample of dlist, dindex, dblock
  files is downloaded, decrypted and the content is checked against
  recorded size values and data hashes. <samples> specifies the number of
  samples to be tested. If "all" is specified, all files in the backup
  will be tested. This is a rolling check, i.e. when executed another time
  different samples are verified than in the first run. A sample consists
  of 1 dlist, 1 dindex, 1 dblock.

Ah yes, thanks, I missed the <samples> parameter on that command, got confused with the options used for the backup command.

It’s now running, but it’s excruciatingly slow, it gets a new dindex file every two to five seconds and they are just a few KiB in size.

Wouldn’t there be a way to process them in parallel ? Right now it’s not using much bandwidth and not even a full CPU core.

If “a few” means about 50, that’s what mine are, for the old default 100 KB blocksize 50 MB remote volume (dblock-size) defaults, so that’s over 500 database lookups just to check the

  --full-remote-verification (Enumeration): Activate in-depth
    verification of files
    After a backup is completed, some (dblock, dindex, dlist)
    files from the remote backend are selected for verification.
    Use this option to turn on full verification, which will
    decrypt the files and examine the insides of each volume,
    instead of simply verifying the external hash. If the option
    --no-backend-verification is set, no remote files are
    verified. This option is automatically set when then
    verification is performed directly. ListAndIndexes is like
    True but only dlist and index volumes are handled.
    * values: True, False, ListAndIndexes
    * default value: False

that was requested. Default would have done less verification, so run faster. For heavier, --full-block-verification would probably have checked more, been slower, and still not guaranteed backup integrity given possible memory changes early during the backup.

Is drive a mechanical drive or SSD? If mechanical, that can sometimes be the bottleneck.

If you are running at old 100KB default blocksize (as discussed earlier), you have many blocks in large database, so there is a chance that giving it more cache might help speed.

The cache situation has been improved in 2.1.0.117 Canary, but what release are you on?

  --sqlite-page-cache (Size): Size of the SQLite page cache
    Use this option to set the size of the SQLite page cache. The
    page cache is used to store the pages of the database in
    memory. Increasing the page cache size may improve
    performance, but will also increase memory usage. If the
    supplied value is the same or less than 2048000 bytes, the
    default SQLite cache value is used.

There has not been a focus on speedups for test with full-remote-verification AFAIK. Backup got attention first because it’s done a lot. It’s got the most designed-in concurrency.

It goes from 18 to 190 KiB with a majority around 50.

The folders are on both kind of drives but there are no disk I/Os either, which, to me, is not surprising as it’s doing a backend check.

2.1.0.116_canary_2025-04-17

Now, I restarted the command with just --full-result=true and it’s really slow like this:

Start the command
Wait 10 minutes without any output
Listing remote
Download the dlist files again
Download the dindex files again

It seems “killing” the process via Ctrl-C prevented it from storing its progress which is quite unfortunate.

Also, while it processes the dindex files very slowly, I get those warnings in syslog:

duplicati-cli[45926]: SQLite warning (284): automatic index on CmpTable-043962D5778BC943922BC55D959B464F(Hash)
duplicati-cli[45926]: SQLite warning (284): automatic index on CmpTable-CBC4246C42EBA64EAE6BC234438EEF24(Name)

The more it goes, the more I feel I should just throw it all away and restart from scratch as it appears the new default values are much more sensible for such large backups as mine.

I think the encrypted file is downloaded to a temporary file, decrypted to another one, and then the resulting .zip file is compared to a database on disk. There should be activity…

If you were in the GUI, About → Show log → Live → Profiling would show database uses. Command line can set that log file or log level too, but it must be optioned at start of a run.

Then you don’t have new SQLite cache memory enhancement, although it’s possible to put it in with an environment variable if you really want it.

I think test also does a check on database integrity. At least the one before backup does.

Backup stores progress. Few or no other things do. Test also has the question of what is progress? If you ask for a sample of 100, and interrupt at 50, and repeat, what happens? Probably a new sample of 100, not what remains of random sample of 100 picked before.

For a similar warning (and I don’t know if anything has been done there or can be here)

As mentioned, you can restart from scratch without throwing the damaged one away, however if you’re absolutely sure that history will never be missed, then toss, I guess.

There’s also another way to test for damage, though one user found test was faster.

  --upload-verification-file (Boolean): Determine if verification
    files are uploaded
    Use this option to upload a verification file after changing
    the remote storage. The file is not encrypted and contains
    the size and SHA256 hashes of all the remote files and can
    be used to verify the integrity of the files.
    * default value: false

Because it’s just the size and hash, it doesn’t check the internal structure, and another limitation is that it assumes you can reach the destination files directly as local files, so downloading from B2 to some local space is needed, but that would also give you your history if you ever need it. You can do a fast download with rclone or similar, or try new

Duplicati.CommandLine.SyncTool --help
Description:
  Remote Synchronization Tool

  This tool synchronizes two remote backends. The tool assumes that the intent is
  to have the destination match the source.

  If the destination has files that are not in the source, they will be deleted
  (or renamed if the retention option is set).

  If the destination has files that are also present in the source, but the files
  differ in size, or if the source files have a newer (more recent) timestamp,
  the destination files will be overwritten by the source files. Given that some
  backends do not allow for metadata or timestamp modification, and that the tool
  is run after backup, the destination files should always have a timestamp that
  is newer (or the same if run promptly) compared to the source files.

  If the force option is set, the destination will be overwritten by the source,
  regardless of the state of the files. It will also skip the initial comparison,
  and delete (or rename) all files in the destination.

  If the verify option is set, the files will be downloaded and compared after
  uploading to ensure that the files are correct. Files that already exist in the
  destination will be verified before being overwritten (if they seemingly
  match).


Usage:
  Duplicati.CommandLine.SyncTool <backend_src> <backend_dst> [options]

Arguments:
  <backend_src>  The source backend string
  <backend_dst>  The destination backend string

Options:
  -y, --confirm, --yes               Automatically confirm the operation
                                     [default: False]
  -d, --dry-run                      Do not actually write or delete files. If
                                     not set here, the global options will be
                                     checked [default: False]
  --dst-options <dst-options>        Options for the destination backend. Each
                                     option is a key-value pair separated by an
                                     equals sign, e.g. --dst-options key1=value1
                                     key2=value2 [default: empty] []
  -f, --force                        Force the synchronization [default: False]
  --global-options <global-options>  Global options all backends. May be
                                     overridden by backend specific options
                                     (src-options, dst-options). Each option is
                                     a key-value pair separated by an equals
                                     sign, e.g. --global-options key1=value1
                                     key2=value2 [default: empty] []
  --log-file <log-file>              The log file to write to. If not set here,
                                     global options will be checked [default:
                                     ""] []
  --log-level <log-level>            The log level to use. If not set here,
                                     global options will be checked [default:
                                     Information]
  --parse-arguments-only             Only parse the arguments and then exit
                                     [default: False]
  --progress                         Print progress to STDOUT [default: False]
  --retention                        Toggles whether to keep old files. Any
                                     deletes will be renames instead [default:
                                     False]
  --retry <retry>                    Number of times to retry on errors
                                     [default: 3]
  --src-options <src-options>        Options for the source backend. Each option
                                     is a key-value pair separated by an equals
                                     sign, e.g. --src-options key1=value1
                                     key2=value2 [default: empty] []
  --verify-contents                  Verify the contents of the files to decide
                                     whether the pre-existing destination files
                                     should be overwritten [default: False]
  --verify-get-after-put             Verify the files after uploading them to
                                     ensure that they were uploaded correctly
                                     [default: False]
  --version                          Show version information
  -?, -h, --help                     Show help and usage information

If you do this, tools to use duplicati-verification.json file are in utility-scripts.

1 Like

I just got the update for 2.1.0.117 but it did not improve the time it takes to “repair” that backup.

However, I did have some progress with repair giving me this new message:

ErrorID: MissingDblockFiles
The backup storage destination is missing data files. You can either enable `--rebuild-missing-dblock-files` or run the purge command to remove these files. The following files are missing: duplicati-bd53dc7ce156e4ffaab63e95b4eb98251.dblock.zip.aes

I then gave it the --rebuild-missing-dblock-files flag and got a new error message:

Failed to perform cleanup for missing file: duplicati-bd53dc7ce156e4ffaab63e95b4eb98251.dblock.zip.aes, message: Repair not possible, missing 628 blocks.
If you want to continue working with the database, you can use the "list-broken-files" and "purge-broken-files" commands to purge the missing data from the database and the remote storage.

And so I tried running the list-broken-files command and that’s where I believe I’m hitting a roadblock with this crash stack:

L'opération ListBrokenFiles a échoué avec l'erreur : The method or operation is not implemented. => The method or operation is not implemented.
The method or operation is not implemented. => The method or operation is not implemented.

System.NotImplementedException: The method or operation is not implemented.
   at Duplicati.Library.Main.Operation.ListBrokenFilesHandler.MockList`1.System.Collections.IEnumerable.GetEnumerator()
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeObject(JsonWriter writer, Object value, JsonObjectContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeObject(JsonWriter writer, Object value, JsonObjectContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
   at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType)
   at Newtonsoft.Json.JsonConvert.SerializeObjectInternal(Object value, Type type, JsonSerializer jsonSerializer)
   at Newtonsoft.Json.JsonConvert.SerializeObject(Object value, Type type, JsonSerializerSettings settings)
   at Newtonsoft.Json.JsonConvert.SerializeObject(Object value, JsonSerializerSettings settings)
   at Duplicati.Library.Modules.Builtin.ResultSerialization.JsonFormatSerializer.SerializeResults(IBasicResults result)
   at Duplicati.Library.Main.Database.LocalDatabase.WriteResults(IBasicResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Func`3 method)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, Func`3 method)
   at Duplicati.Library.Main.Controller.ListBrokenFiles(IFilter filter, Func`6 callbackhandler)
   at Duplicati.CommandLine.Commands.ListBrokenFiles(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

The more I look at it, the more it hints at throwing everything away.

It just got replaced with 2.1.0.118. I’m not going to quote the whole change description, however I think you might be in the area that was found broken, and was just rewritten:

I don’t know if the dev will come by to comment further on your 2.1.0.117 repair attempt.

Unfortunate error. It is supposed to return a “kind-of-a-list” and this call attempts to report the results, but fails due to the fake list. I have a fix for it, but you should be able to disable monitoring for this call and then it should work.

As @ts678 described, the 2.1.0.118 version has a new repair method that can now deal with cases where it cannot fully recreate a missing dblock file. If you run the repair with that method, it will recover as much as possible, and then you can run purge-broken-files in case there is still some data that could not be recovered.