Hash and size differs on dblock

steveedmonds · May 28, 2021, 5:08am

I am running 2.0.6.1_beta_2021-05-03.
I have a warning
2021-05-28 16:09:39 +12 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteHash]: remote file duplicati-b995d8a5b9ef643c5924caaf8c80718fd.dblock.zip.aes is listed as Verified with size 180355072 but should be 1073660477, please verify the sha256 hash “zbR71fdBP+y2mUoZOQnRHb7rMORK1i8PkcqoYFqr6M4=”

I have run a verify, repair and rebuilt the database but nothing changed. The dblock is in fact 180355072 and the sha256 for the block differs from the above.

Is there routine in Duplicati to rebuild the indexes and database from the backup blocks so Duplicati has the correct hash and block size.
Can I run run a full backup (17-20 hours) and create a to regain consistency and preserve the 9 versions of data I have.
Ta, Steve

ts678 · May 28, 2021, 8:24pm

Adding some commas, 1,073,660,477 dblock turned into 180,355,072 which is a nice even 0xAC00000, which sort of suggests that filesystem truncated. What is the destination storage type, and what did you configure for Remote volume size on Options screen 5? The default is 50 MB. It’s not the total space.

One can sometimes lose a 50 MB dblock without big pain, but how big is 1 GB compared to all source?

The Recreate and Repair buttons do this, but backup blocks are what you lost, so that won’t directly work.
The database doesn’t have data blocks. The backup does, and it looks like you lost about 1 GB of them…

Disaster Recovery would be worthwhile reading. The LIST-BROKEN-FILES command and The PURGE-BROKEN-FILES command rely (I think) on file presence/absence, so you can rename your damaged file with a name not beginning with duplicati-, or hide some other way, or maybe delete it. Is it recent date?

Using The AFFECTED command on duplicati-b995d8a5b9ef643c5924caaf8c80718fd.dblock.zip.aes can be done without touching file at all, and is probably a good first step. You can do it in a GUI Commandline, changing the Command to affected and Commandline arguments box to the name of the file of interest.

There’s no telling what files and what versions are affected by this lost file until you run commands to see.
It could be that this lost dblock is a new one, so loss (unless it’s from compact) only affects latest version.

EDIT:

Regardless, the list-broken-files and purge-broken-files after deleting your damaged dblock should let you continue backups even though I don’t know what will be lost. You should also consider getting the remote volume size to a more reasonable level, per Choosing sizes in Duplicati. The huge one will cause various future problems, including potentially extremely slow restores if several of those huge ones are required…

steveedmonds · May 28, 2021, 9:26pm

Remote file system is an 8TB NAS with btrfs. (direct Ethernet cable connection)
Remote volume size is 1GB.
Source: 2.16 TB, Backup: 1.69 TB / 8 Versions

The block mentioned in the warning does exist and lists as size 180,355,072. After running Duplicati for a few weeks I changed retention from keep all backups to a custom 1W:1D,1Y:1M and a lot of files were deleted, the backup went from about 1.8TB to the 1.69. Backups are still reporting as running so I don’t think I have a disaster, just 2 warnings (1 before, 1 after) every backup run.
Examined 260917 (2.16 TB)
Opened 95504 (793.70 GB)
Added 18 (8.73 MB)
Modified 116 (19.68 GB)
Deleted 4

Running the AFFECTED command in the GUI seems a little unintuitive, all the backup options are filled by default.
The manual shows Duplicati.CommandLine.exe affected <storage-URL> <remote-filename> [<remote-filenames>] [<options>] with no explanation of options.
I am assuming I would need at least --dbpath= --encryption-module=aes --compression-module=zip --passphrase=

ts678 · May 28, 2021, 9:39pm

Might be reasonable for this large backup, but still seems large. I’d worry more about blocksize, as 100 KB blocks are going to be very numerous for so large a backup.Unfortunately you can’t change blocksize now. You could if you made a new backup.

Fortunately only a small one, and hardly visible unless you wish to resotre (which may hit damaged file).

Definition of a disaster

Missing or corrupted backup files.

I suspect that for this command, the extras won’t hurt the result. I think this command is database-based, so --dbpath is likely needed, but I’d suspect that the encryption and compression options are not relevant.

steveedmonds · May 28, 2021, 9:39pm

The last write to the dblock in the warning was last Saturday (probably when I change the retention to custom.
I probably have the original dblock in my backup of the Duplicati backup I keep off site.

ts678 · May 28, 2021, 9:46pm

Also see my prior message, which crossed your latest.

It would be very interesting to see if it was good at some prior date. There is a remote file verification after a backup finishes uploading, but it’s not a full download, just a list of files names and sizes that you can view with <job> → Show log → Remote, then click on a list, e.g. near the end of a backup. Recreating the DB wipes out such history though…

If you have a good-looking version of bad file, that might save you the trouble of having to sacrifice affected files from your current backups. If that’s the only problem, just put the file back and you “might” be all good.

steveedmonds · May 28, 2021, 10:48pm

In a test I ran on another machine I deleted a few dblocks from the storage destination to see what happened. Duplicati did it’s thing and is running fine with no error/warning messages. I can probably do without this affected blocks data were I to save a copy and delete the block.
I have just checked my offsite storage from the week before this affected block was written and it is not in the offsite storage.
Offsite storage has 4797 files, 2.3TB, 2398 dblocks, it has reduced a bit since a clean-out of some older files and going to a custom retention scheme.

ts678 · May 29, 2021, 9:13pm

Any missing file should have been noticed and made an error before backup:

An advanced option can turn verify off, but it’s dangerous to not verify:

no-backend-verification
--no-backend-verification = false
If this flag is set, the local database is not compared to the remote filelist on startup. The intended usage for this option is to work correctly in cases where the filelisting is broken or unavailable.

Finding the actual missing file name is cumbersome, but About → Show log → Live → Warning can.
This is the long-log version of missing file at start of backup, taken from a log file at profiling log level:

2021-05-29 16:18:45 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.BackupHandler-PreBackupVerify]: Starting - PreBackupVerify
2021-05-29 16:18:45 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2021-05-29 16:18:45 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2021-05-29 16:18:45 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (20 bytes)
2021-05-29 16:18:45 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationList]: RemoteOperationList took 0:00:00:00.001
2021-05-29 16:18:45 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC
2021-05-29 16:18:45 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC took 0:00:00:00.000
2021-05-29 16:18:45 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT DISTINCT "Name", "State" FROM "Remotevolume" WHERE "Name" IN (SELECT "Name" FROM "Remotevolume" WHERE "State" IN ("Deleted", "Deleting")) AND NOT "State" IN ("Deleted", "Deleting")
2021-05-29 16:18:45 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT DISTINCT "Name", "State" FROM "Remotevolume" WHERE "Name" IN (SELECT "Name" FROM "Remotevolume" WHERE "State" IN ("Deleted", "Deleting")) AND NOT "State" IN ("Deleted", "Deleting") took 0:00:00:00.000
2021-05-29 16:18:45 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingFile]: Missing file: duplicati-b3037370430404f7e99e2ed0ea1c6e181.dblock.zip
2021-05-29 16:18:45 -04 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteFiles]: Found 1 files that are missing from the remote storage, please run repair
2021-05-29 16:18:45 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Operation.BackupHandler-PreBackupVerify]: PreBackupVerify took 0:00:00:00.048

I strongly advise against leaving a broken dblock situation lingering, because it may affect future backups that assume that it exists and has its data. Blocks that should be there won’t be put into a new dblock file.

steveedmonds · May 29, 2021, 10:50pm

As I have a backup of the backup that is acceptable I thought I would play around a bit.
First off, rebuilding the database didn’t take long.
Removing the block initially causing the warning gave me an error referencing the index file for that block.
I removed the index file and am rebuilding the database again, so far this has been running 17 hours with 1 CPU core at 100%.

If this fails, I might try restoring the backup and rebuilding the index.

steveedmonds · May 29, 2021, 11:06pm

.sqlite and .sqlite-journal are being updated every 9-10 minutes. I hope it’ts not pulling every block to verify the index and update the database at the rate of 1 every 10 minutes. That will take 118 days.

ts678 · May 29, 2021, 11:27pm

Removing dindex files without having first purged broken files (which updates the dlist) can leave the dlist referencing a block that’s not in any dindex file (but might have been in the deleted one). This may search every dblock (slow). You can watch progress on About → Show log → Live → Verbose, or simply seeing progress bar on the DB recreate get above 70% means dblock downloads. 90% - 100% does all the rest.

steveedmonds · May 30, 2021, 2:37am

The green in the bar is at about 90.5% of the total bar length, the live log indicates 1 block every 9.5 minutes on average so a re-index will be months.
I will remove the current backup from the NAS and restore the offsite Duplicati backup files to the NAS and see how it goes.
I just did a quick test and it takes less than 30 seconds to decrypt a 1GB block and ~ 5 seconds to unzip it so there must be a lot of something going on filling in the rest of the 9.5 minutes per block.

steveedmonds · May 30, 2021, 11:14pm

I restored the off site backup to the NAS and recreated the database (17hrs total) and ran another backup from my PC to the NAS and all seems fine now. This worked well and I am happy with this recovery solution and time frame. PCs back up to NAS, NAS backup off site.

steveedmonds · June 2, 2021, 8:17pm

The same error has reoccurred again.
Examined 271472 (2.16 TB) Opened 328 (3.31 GB) Added 183 (46.81 MB) Modified 145 (3.26 GB) Deleted 113
dblock.zip.aes is listed as Uploaded with size 171966464 but should be 221559661.

I am starting to wonder if this is related to using a custom retention policy as it started when I set a retention policy.

I will restore my offsite backup again and change from a custom retention policy to test.

ts678 · June 2, 2021, 9:01pm

You could check file’s time stamp to see if you can figure out anything special about the file.
Sometimes one can match timestamps to backups and estimate when the file was written.
You can also go to <job> → Show log → Remote just after that, click list line, and find file.
Or maybe that’s where you are. Did this error get detected at the checks at end of backup?

log-file=<path> and log-file-log-level=retry could give a better view of individual file activities.

What OS is Duplicati on? There are different sorts of SMB flakiness in Windows and Linux.

steveedmonds · June 3, 2021, 10:05am

OS is OpenSuse Leap 15.2
The backup runs at 6am daily, A message was showing in the GUI when I got up.
The file was created this morning, in today’s list at the end, not in the list at the start.
This is the put, there was only this one block put.
Jun 3, 2021 6:05 AM: put duplicati-b92c7bef0607e4afc951cd8476d3df821.dblock.zip.aes
{“Size”:221559661,“Hash”:“w3QYggvh63liqjAA681Mqlzwt8Pme4/Mp/KwEHCzwAE=”}
and the list which agrees with an ls -l duplicati-b92c7bef0607e4afc951cd8476d3df821.dblock.zip.aes
{“Name”:“duplicati-b92c7bef0607e4afc951cd8476d3df821.dblock.zip.aes”,“LastAccess”:“2021-06-03T06:05:30.9410984+12:00”,“LastModification”:“2021-06-03T06:05:30.9410984+12:00”,“Size”:171966464,“IsFolder”:false},

4 backups of this set have run successfully prior to this warning.

I can restore my offsite backups again and monitor more. Is there an additional level of logging. Is there any chance compaction or something is going on leading to a reduction in file size. Can the blocks not be deleted from /tmp after the put so I can actually check the size.

ts678 · June 3, 2021, 2:53pm

If I follow this, Duplicati intended to write 221559661 bytes, but after backup the verification file list showed 171966464, which this time is not an even power of 2 size. I suppose you could look at the start of the bad NAS file (maybe even on the NAS, in case the client has a different view) to see if it starts with AES, and if that’s true, then it might be intact as far as it wrote. Why it stopped is a good question, but CIFS has bugs.

[Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError] is one such, where the problem is verified (by test) fixed in late kernel versions, but not by all vendors that do backports into older kernels in the hope of keeping things the-same-but-better. Enterprise and LTS kernel favor that…

openSUSE Leap 15.3 Bridges Path to Enterprise says it’s like May’s SUSE Linux Enterprise Linux 15 SP3.
There’s a small chance that a kernel fix helps CIFS. If you intend to update to 15.3 anyway, you would see.

Other than that, debugging options such as strace are possible but very cumbersome on very rare issues.

You might need to test a different way of putting files onto the NAS, e.g. perhaps it supports NFS or SFTP?

Compact never shrinks an existing file. It writes a new one then deletes the old ones that fed the compact.
You can see this in the operations if you watch them. You can prevent compact from running by using the no-auto-compact option, and you can also see the compact decision in a log at or above Information level:

2021-05-22 11:42:55 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 25.47% wasted space and the limit is 25%

You can also look in the old logs to see how often it does compact. It might not be a commonly done thing.

Sometimes when I really want to catch files, I just run some kind of a copy loop of dup-* to another folder.
Because your files are pretty big, you could also just poll a directory listing, and guess from time and size.
Although it would be slow and might not add a lot, you could even sha256sum files for future Base64 test.

If you want to do copies but have limited space, you might be able to age off the oldest ones, and try to do faster Duplicati fail with list-verify-uploads, which doesn’t seem much used, but might do a timely verify…

--list-verify-uploads = false
Verify uploads by listing contents.

Another way to limit redundant copying is to only copy files if the source timestamp got newer, e.g. using

   -u, --update
         copy only when the SOURCE file is newer than the destination file or when the destination file is missing

You can log system calls, but it will be a lot of lines. You can see an example here from CIFS bug chase.

There’s no Duplicati level logging of the file copy. It looks like a single call to File.Copy Method, done here:

github.com

duplicati/duplicati/blob/5af46b6eecc9fe04c140ec5304dabdd50bdaa3af/Duplicati/Library/Common/IO/SystemIOLinux.cs#L269-L272


      
          public void FileCopy(string source, string target, bool overwrite)

          {

              File.Copy(source, target, overwrite);

          }

steveedmonds · June 3, 2021, 8:04pm

Thanks for all your help, it is probably the CIFS bug.
I should have thought to try before, but I just attempted to decrypt the dblock and it is corrupted.
I will upgrade to Leap 15.3, but not for a few months.
I can use SFTP instead of CIFS, NFS is also available.

steveedmonds · June 5, 2021, 5:54am

I have swapped my mounting of the NAS to nfs for Duplicati backups. So far seems OK.
It looks like the issue with CIFS errors causing problems applies to mono and go so happy to keep using cifs for normal file storage which is shared with windows, Mac and android.

ts678 · June 5, 2021, 2:10pm

Good research. The reason I didn’t suggest rclone storage type as a workaround is they hit CIFS bug too.

It’s not 100% sure it wouldn’t work, but you have non-CIFS options. I’m also not certain Leap 15.3 kernel is hugely different from a fully updated Leap 15.2 one. They all look like 5.3.18 plus some amount of patches.