Invalid header marker cryptographic when trying to delete unwanted files

rick55 · January 28, 2021, 9:18am

while trying to delete unwanted files, I get the following error

Failed: Invalid header marker
Details: System.Security.Cryptography.CryptographicException: Invalid header marker —> System.IO.InvalidDataException: Invalid header marker
at SharpAESCrypt.SharpAESCrypt.ReadEncryptionHeader(String password, Boolean skipFileSizeCheck)
at SharpAESCrypt.SharpAESCrypt…ctor(String password, Stream stream, OperationMode mode, Boolean skipFileSizeCheck)
at Duplicati.Library.Encryption.AESEncryption.Decrypt(Stream input)
at Duplicati.Library.Encryption.EncryptionBase.Decrypt(Stream input, Stream output)
at Duplicati.Library.Main.BackendManager.<>c__DisplayClass36_0.b__0()
at System.Threading.Tasks.Task.Execute()

purge-broken-files does nothing, as does rebuilding the DB. If i manually delete the file, then the ‘deleting unwanted files’ goes thru its retry cycle and reports ‘503 server not found’ (which is confusing, since it is just a single dblock file that is not found).

There is definitely a problem with the dblock file as I get the same invalid header message when trying to restore one of the files. Although the commandline program to get ‘affected’ files works fine and I don’t know why.

At this point, I’d just like to know how to remove this obviously broken file, since it is corrupted somehow and I’d like to replace it with a fresh dblock file.

For the moment, I’ve changed my retention scheme to keep all backups, just so I can get a error-free backup until some resolution is reached on this problem.

Thanks
Rick

ts678 · January 28, 2021, 9:17pm

I assume that refers to its status. There are several things in the phase, mainly delete and compact. Probably it’s doing downloads for compact. About → Show log → Live → Retry can show that work.

AES File Format shows the expected format, if you want to compare with what your file actually has.

What Storage Type is the destination? I assume the 503 is from there.

Disaster Recovery covers ways to do that A sequence may be

Delete or move/hide the corrupted file.
list-broken-files to show what got lost.
purge-broken-files to purge those files.

Is that with the bad file, or without it? It’s been pointed out that manual file removal may be necessary.

no-auto-compact might also work, as the problem is probably in the downloads done for the compact.

It’s not clear that backup integrity is good, but there’s a chance. Might be better than stopping backups.

The bad file may be interesting if you can characterize it by date, Duplicati that made it, maybe content.
I think Duplicati checks the file size before decrypt. A pre-2.0.5.1 Duplicati could sometimes rearrange contents so that the size would be right, with the content misplaced. You’d also need to use throttling…

rick55 · January 29, 2021, 7:52am

Inline in RED. The computer thought this was too short a reply. Not sure how I should have done it differently, but this ‘reply’ will let it go thru to you and you can see my replies below in RED.

Thanks

Rick

ts678 · January 29, 2021, 3:03pm

There’s no reply in red color, just what’s seen above.

If you’re trying to quote, just mouse-highlight and click the “Quote” offer bubble.

rick55 · January 29, 2021, 3:59pm

delete unwanted files

Reading further down the log, it is running CompactHandler. So I guess it’s trying to download a file with a corrupt header

I assume that refers to its status. There are several things in the phase, mainly delete and compact. Probably it’s doing downloads for compact. About → Show log → Live → Retry can show that work.

rick55:

Invalid header marker

AES File Format shows the expected format, if you want to compare with what your file actually has.

AES type is the only format I use. All files have zip.aes on the end. Not sure how I would check the file itself to see if was ‘marked’ as AES in

The first couple of bytes of the file.

rick55:

503 server not found

What Storage Type is the destination? I assume the 503 is from there.

Webdav. Why would ‘puts’ work but a download for compacting fail to the same device/server?

rick55:

I’d just like to know how to remove this obviously broken file

Disaster Recovery covers ways to do that A sequence may be

Delete or move/hide the corrupted file.
list-broken-files to show what got lost.
purge-broken-files to purge those files.

rick55:

purge-broken-files does nothing

Is that with the bad file, or without it? It’s been pointed out that manual file removal may be necessary.

that with the bad file, or without it? It’s been pointed out that manual file removal may be necessary.

So whether I hide or remove the file, list-broken-files returns nothing, purge-broken-files does nothing,

Repair db doesn’t help, rebuild db doesn’t help. Also with a hidden dblock file, the retries kick in and that’s when I see the 503 server not found

rick55:

I’ve changed my retention scheme to keep all backups

no-auto-compact might also work, as the problem is probably in the downloads done for the compact.

My retention change didn’t change the end results. I’ll try no-auto-compact when I can get the user to leave his machine up

rick55:

just so I can get a error-free backup until some resolution is reached on this problem.

It’s not clear that backup integrity is good, but there’s a chance. Might be better than stopping backups.

The bad file may be interesting if you can characterize it by date, Duplicati that made it, maybe content.
I think Duplicati checks the file size before decrypt. A pre-2.0.5.1 Duplicati could sometimes rearrange contents so that the size would be right, with the content misplaced. You’d also need to use throttling…

All of my duplicati backups are 2.0.5.1, all dumping to nextcloud/webdav on a linux server.

If I unzip one of these ‘header error’ files, that leaves me with an uncompressed AES file, and I have no idea what to do with that. I think I mentioned that running the commandline to get the affected files appeared to work, but since the machine in question is off, I can’t verify that.

What’s the recommended log level for scheduled jobs? Until I track this down, should it be set to ‘retry’? – assuming that’s the setting that will give the most detail. Where on the PC are these logs stored? Can I look at previous logs? I think I might also increase the number of files being verified if you think that would help with the integrity issue.

ts678 · January 29, 2021, 5:34pm

What OS do you have? On Windows, someone opened a file in notepad, which was enough to see start.
Linux has good built-in tools, e.g. hexdump -C. Windows would maybe need an install of some hex editor.

It’s not clear that your put worked, is it? Have you looked at the destination to see how the file looks there?
Until you check, we don’t know if it went bad before upload, on upload, sitting on remote, on download, etc.

Are you doing this with the GUI Commandline screen, to ensure you’re not pointing to an empty database? Working from a GUI Export As Command-line is another way to make sure of matching dbpath from a job.

You might want to look into specifics for your server. That doesn’t seem like the right thing to set off a 503.

If you want a better look at the action, you can watch About → Show log → Live → Retry for errors/retries.

Duplicati.CommandLine.BackendTool.exe can also get specific files. It needs a URL like Export gives you.

That sequence sounds wrong. If you have a file ending with .zip.aes, first you optionally look at it to see if it actually starts with AES and the rest of the header. You decrypt it, e.g. with AES Crypt, then you can unzip. What’s inside is a bunch of blocks, generally limited to 1 KiB in size, and named based on a hash of block.

But you have no need to go that deep. You’re getting stuck on the decrypt, and I think AES Crypt would too.

So you might perhaps be able to get on the server to examine trouble file size, contents, and date directly?

and always have been? That will rule out the odd pre-2.0.5.1 bug, but not if the backups began before then.

So I don’t know what happened with the file, but if you can look then maybe we can get a little further with it.
Getting to the bottom might be impossible depending on the time urgency, but preserving info may be best.

Profiling is a hugely detailed log, but --profile-all-database-queries on top of it makes it even bigger. Retry is pretty mild, but enough to get a first look at what seems to be a file that doesn’t come down in good shape.

Where your log-file options says, and level is set by your log-file-log-level option. A live log doesn’t use files.

If you suspect problem may occur again, that’s a good idea. You can set backup-test-samples or

  --backup-test-percentage (Integer): The percentage of samples to test after
    a backup
    After a backup is completed, some (dblock, dindex, dlist) files from the
    remote backend are selected for verification. Use this option to specify
    the percentage (between 0 and 100) of files to test. If the
    backup-test-samples option is also provided, the number of samples tested
    is the maximum implied by the two options. If the no-backend-verification
    option is provided, no remote files are verified.
    * default value: 0

rick55 · January 30, 2021, 8:21am

to fill in some of the missing info: most of my client machines are running windows 10. My webdav/nextcloud is on ubuntu 18.04. I have full and complete access to the linux server.

Had the error pop up on another system’s backup. In both cases, --no-auto-compact allowed the job to complete error free.

I saved the bad files in a separate area and did an od -c on each of them (I read octal easier than hex). In one case the entire file was filled with NUL bytes (\000)

In the 2^nd case, the file ‘looked’ normal (bunch of data, not NULs), but the 1^st 3 characters were not AES, but were ~ 307 %

I upped the percentage of verification to 10% and turned –no-auto-compact=true and both verified and finished with no errors

What worries me is that with auto-compact-off, that file corruption might go unnoticed (unless I set verify to 100%), so I’d like to leave auto-compact-on. This will give me the file name that throws the error, but then I need a way to purge that bad file.

I’ve tried every variation of move/hide the file, repair db, rebuild db, list and purge broken files to no avail. I even tried hex editing each of the files to replace the 1^st 3 characters with A E S. No change – list or purge broken return ‘finished 0’.

Is there anyway to drill down in either the list or purge broken commands to see why it’s ignoring the corrupted file?

Thanks!

Rick

ts678 · January 30, 2021, 2:22pm

You might want to check out the health of its drive, e.g. with smartctl or error logs.
I think sector spares might go in as NUL, but it seems unlikely to kill a whole file…

Did you keep or note original date and time, so as to get more data on its history?
Local database also keeps information, but it may disappear due to cleanup work.

Warnings and error messages from backup jobs describes a way to look into DB.
Its TimeStamp is a UNIX timestamp, so convert at epochconverter.com or similar.
You should be able to see the original upload, plus any prior downloads for testing.
Creating a bug report and posting a link to it is also an option if you prefer that way.

You’re probably making an unreliable backup that won’t restore if it needs a bad file.
The TEST command, e.g. in GUI CommandlineUsing GUI Commandline could test.
Faster is upload-verification-file, then on server use utility-scripts/DuplicatiVerify.py
The Duplicati install location may vary, but you can see if you have /usr/lib/duplicati

Assuming that means about 307 % characters at start of file. Was it random later?
You can look at some other examples, but I think encrypted files are pretty random.

Yes, good to be worried and not ignore errors in files.

Can you comment on my earlier question – how exactly are you getting right DB?
GUI Commandline? Export to true CLI? True CLI (not GUI) likely needs a –dbpath.
If you’re on GUI Commandline, then you can go to <job> → Show log → Remote

If you have a destination URL from an Export As Command-line, you can edit it for
Duplicati.CommandLine.BackendTester.exe (it needs an empty folder) and try that.
That will at least exercise the upload/download path, to eventually gain confidence.

rick55 · January 30, 2021, 8:26pm

Just ran a fsck -f on my data array, came back clean.

I also run smartmontools on it and monitor the results every 4 hours – all looks fine according to that output.

My “~ 307 %” was misinterpreted, my bad! The 1^st 3 bytes of that file are the tilde character, 307 (base 8), and the percent character. Rest of the bytes are random

I used find|cpio to copy the two bad files to separate areas, so timestamps were preserved. One is 1/14/21, the other is 1/15/21. Two different PCs, one ‘corrupt’ file on each.

Re running list/purge broken files. I’m starting in a browser window (localhost:8200) and each PC, then dropping into commandline from the job in question. I clear all entries except options relating to backup file info: –dbpath…. ( I also keep dblock-size, compression-module, encryption-module and passphrase)

I have the destination URL, so I will play with the backendTester and let you know.

I will also add the upload-verification-file to the 2 problem PCs, plus a 3^rd PC with no errors as a baseline and report back.

Appreciate all your help!

Rick

ts678 · January 31, 2021, 12:36am

So if you like, you can trace those in the DB to see if that’s when they were originally put. Also see list:

[
{"Name":"duplicati-20210130T130245Z.dlist.zip","LastAccess":"2021-01-30T08:02:46.5743717-05:00","LastModification":"2021-01-30T08:02:46.5743717-05:00","Size":745,"IsFolder":false},
{"Name":"duplicati-b460d16ade75f47d09a6eed2dcc682367.dblock.zip","LastAccess":"2021-01-30T08:03:33.4183973-05:00","LastModification":"2021-01-30T08:03:33.4183973-05:00","Size":1135,"IsFolder":false},
{"Name":"duplicati-i6a28db85476149a78c9c1bd0ecb9e06e.dindex.zip","LastAccess":"2021-01-30T08:02:46.2600041-05:00","LastModification":"2021-01-30T08:02:46.2600041-05:00","Size":684,"IsFolder":false}
]

is an example from the Data column of a list Operation after a backup. 8:02:46 matches Explorer time.
I don’t know if anything could have changed your file after it was uploaded, but if so this might detect that.

I don’t know why list-broken-files isn’t complaining. If somehow the file is no longer needed, the compact should have just deleted it instead of downloading it to pull any still-in-use blocks out to be put elsewhere.
The AFFECTED command is another way to ask what files are affected by the deletion of a dblock. Any?

rick55 · January 31, 2021, 1:32am

I’m currently running the python script on 3 different directories (2 that threw the error and 1 that didn’t as a baseline)

Once that completes, I’ll look into the DB. Also will run the ‘affected’ command. It sounds like maybe the list-broken needs to be reported as a bug. If I do that, what info should I include?

Thanks

Rick

rick55 · January 31, 2021, 10:07am

I have run the python script on 3 files, one of which was supposed to be good. I found at least one file on each system that was either hash check failed ort missing file. I’m currently in the process of added the upload-verification switch to every system’s backup job. Whether or not I run the script to read the resulting file may vary.

Since this process checks every file, does it eliminate the need to do any verify files at the end of a duplicati job? As I understand it anyway, the verify doesn’t compare the source files with the compressed block file, but merely checks to see if it uploaded correctly to the destination. If that is the case, I think the upload-verification is a better check.

Either way, I’m left with the apparently faulty list/purge-broken commands. Clearly, I need to be able to ‘fix’ the hash and missing file errors that the upload-verification single out. I guess I’m asking for help as to how to submit a bug report for those particular functions and what documentation needs to be included.

Thanks!
Rick

ts678 · January 31, 2021, 2:14pm

The issue template requests steps to reproduce it, because it helps developers a lot if they can do that.
This means that ideally a problem is reproducible from scratch on a system with a simple storage type.
Generally my steps for generic issues start with a new backup to local files because that always exists.
You should probably keep things simple and private enough that you don’t need to redact or explain lots.

If you can’t reproduce the issue, and it relies on your exact backup, the DB bug report might give a clue.
But it would give a clue here too, if you would do it. Also still waiting to hear how the affected turns out.
If it says there’s an issue and list-broken-files says there isn’t, then someone can try investigating.

Much of this area hasn’t changed in a long time, but developers have, so do what you can to help them.
There may be some interactive Q&A. I’m just giving generic thoughts on what makes a good bug report.

If you want to get very technical you can try to see if list-broken-files actually looked at remote directory, using whatever methods are available. I don’t know if Nextcloud can see it. Duplicati live log can show it.
Unfortunately list isn’t logged in DB. I see after-backup list in DB, but not from list-broken-files.

This is using the Test button. Live log says:

Jan 31, 2021 8:49 AM: Backend event: List - Completed: (7 bytes)

but I can’t click on it to show what it saw. If I go to <job> → Show log → Remote I get expandable below

Jan 31, 2021 8:49 AM: list

list-broken-files does this in live log (Information level is enough):

Jan 31, 2021 8:54 AM: Backend event: List - Completed: (7 bytes)

but job Remote log doesn’t have anything, nor does the DB if I look inside. There my last Operation was

ID	Description     Timestamp
11	ListBrokenFiles	1612101273

but last RemoteOperation was earlier

ID	OperationID	Timestamp	Operation	Path	Data
17	10	        1612100948	list

Previously, I even ran Sysinternals Process Monitor to see if it looked, and it looked like it did, but not log.
But this level of looking is not typical. If you can come up with a reproducible test, that should be great…
Above diversion was partly to see if I could have you see what list-broken-files saw. Answer is not easily.

Here’s a run of an affected where I hid a dblock file by giving it a prefix of hidden- (easy to undo later):

Running commandline entry
Finished!

            
The following filesets are affected:
0	: 1/30/2021 6:06:40 PM
1	: 1/30/2021 8:02:45 AM

A total of 2 file(s) are affected:
C:\backup source\A.txt
C:\backup source\short.txt

The following related log messages were found:
1/30/2021 1:02:45 PM: {"Size":1135,"Hash":"M6MBDkzuoW1M8kAVef6I92X5ImsLOuRfSBMRiJlVbfc="}
1/30/2021 1:02:46 PM: {"DeletedFiles":0,"DeletedFolders":0,"ModifiedFiles":0,"ExaminedFiles":2,"OpenedFiles":2,"AddedF ...
1/30/2021 1:03:55 PM: {"MainOperation":"Test","VerificationsActualLength":3,"Verifications":[{"Key":"duplicati-2021013 ...
1/30/2021 1:03:56 PM: {"Size":1135,"Hash":"M6MBDkzuoW1M8kAVef6I92X5ImsLOuRfSBMRiJlVbfc="}

Return code: 0

Below is a list-broken-files view. It’s on 2.0.5.112, but there are no changes documented in the area.

Running commandline entry
Finished!

            
  Listing remote folder ...
1	: 1/30/2021 8:02:45 AM	(2 match(es))
	C:\backup source\A.txt (1 bytes)
	C:\backup source\short.txt (19 bytes)
0	: 1/30/2021 6:06:40 PM	(2 match(es))
	C:\backup source\A.txt (1 bytes)
	C:\backup source\short.txt (19 bytes)
Return code: 0

I think it should be as good at seeing problem files on the server. Of course it can’t see download issues.
The main advantage is that it’s probably faster, and it won’t tie things up as much on the Duplicati clients.
A disadvantage is that if it runs while Suplicati is doing things, it might view client’s file changes as errors.

Issues is where to file. I covered what to include to try to get an answer. Very vague bugs aren’t workable. There are a lot of old ones sitting there not moving, and a reason is that one can’t make anything of them.

rick55 · February 1, 2021, 2:44am

Spent the day testing:

according to the python verify script:

*** Hash check failed for file: /media/nc-data/john-work-pc/files/duplicati-b424ce9a5d2794a819c9102cd22dde13d.dblock.zip.aes

File missing: /media/nc-data/john-work-pc/files/duplicati-20210130T074246Z.dlist.zip.aes

‘affected’ returns the following:

duplicati.commandline affected webdavs://example.net/remote.php/dav/files/john

duplicati-b424ce9a5d2794a819c9102cd22dde13d.dblock.zip.aes --dbpath=C:\programdata\duplicati\data\COHIESFNDE.sqlite

The following filesets are affected:

0 : 1/31/2021 5:18:36 PM

1 : 1/31/2021 4:00:00 PM

2 : 1/30/2021 4:00:00 PM

3 : 1/29/2021 4:00:00 PM

4 : 1/28/2021 4:00:00 PM

5 : 1/27/2021 4:00:00 PM

6 : 1/26/2021 4:00:00 PM

7 : 1/25/2021 4:00:00 PM

8 : 1/19/2021 4:00:00 PM

A total of 2 file(s) are affected:

C:\quickbooks.data\QuickBooksAutoDataRecovery\example.QBW.ADR

C:\quickbooks.data\QuickBooksAutoDataRecovery\example.QBW.ADR.old

Found 2 related log messages (use --full-result to see the data)

ts678 · February 1, 2021, 3:59pm

This file isn’t mentioned earlier. Is it part of history or newly found as corrupted?
An all test should find the same corruption, but it would probably take longer…

As an aside, if checks are slow even server-side, it would probably be possible
to script a check of only the unchecked files. Once written, they don’t change…
Or if they do change after having checked good once, that would be significant.

If a new find, I guess prior tests to examine timestamp, DB, file data, etc. apply.
Now that you have one that affected sees, you can test list-broken-files.
The intent was to see if there were cases where only one of two tests caught it.

I don’t use Nextcloud, but from my research it seems like it supports numerous protocols to secondary storage from itself where it is the client, but only supports WebDAV for programs to access it. Correct?

Do you have the use-ssl option set on your Duplicati Destination? If not, possibly it would help to avoid undetected transmission errors (I’m not certain), but if unset it makes monitoring packet data possible.

Large data captures need either a lot of storage or quick detection of an issue to turn off capture to look. Chasing down rare problems is not an easy job. I’m in the middle of that with Linux CIFS bugs right now.

Is the Nextcloud server Ethernet-connected all the way to Duplicati? WiFi or Internet add more suspects.
You can get some low-level stats with netstat -s or ifconfig, but interpreting them is not very simple.
Still, errors at lower levels seem unlikely to be able to turn a whole dblock file into NULs. That’s quite odd.

There are only a few places to get information on how things are going, so might as well hit what we can.

rick55 · February 1, 2021, 10:56pm

First, the easy answers

I set up Nextcloud so I wouldn’t have to perpetually pay for someone else’s storage (like AWS, dropbox, etc.). Since I’m responsible for 20+ TBs, that was running close to $5,000 a year. Spending much less than that bought a whole bunch of shiny new 8 TB drives and a raid controller, which of course was a one time expense. To me, hosting my own cloud was the only cost effective way to go, plus I control the data and no one else.

The protocol that was suggested to access ‘my cloud’ from the outside world was WebDAV which seems to work fine.

The nextcloud server (on ubuntu 18.04) is directly connected thru one gig switch to the fiber coming into my office

Duplicati destination uses SSL

ts678 · February 1, 2021, 11:07pm

Thanks for the context, but I just wanted confirmation that all those protocols are only Nextcloud-as-client.

So perhaps you’re not sure if WebDAV is the only one it serves either. It does look that way from the docs.

I think I’ve seen posts where other protocols have been added independently, accessing Nextcloud’s drive.

I would like to continue to debug WebDAV if you’re willing, but if not possible you can try a different access.

rick55 · February 1, 2021, 11:04pm

I was intrigued by your comment about changing the python process to only check new files. I looked at the json verification file and it looks like there are no human delimiters, so how would I modify that script to eliminate re-checking every file? Was that just a suggestion on your part (excellent idea by the way) or do you have a script that you use or can you suggest how I might go about it myself?

Thanks

ts678 · February 1, 2021, 11:31pm

There are JSON delimiters. I don’t have a script, but a couple of ways to to it include:

Go by times. The general scheme is that files upload, then duplicati-verification.json.
An issue may be that parallel uploads may finish short uploads first, so be careful…
If it works, then when new duplicati-verification arrives, check files since prior check.

Another method is to process new duplicati-verification.json to test files marked with
"State":2, as I suspect those are in state “Uploaded” in DB, and yet “Verified” - yet.

Below is my job Remote log, which in reverse-chronological order shows the work

Feb 1, 2021 6:11 PM: get duplicati-b0e0f59fc3a764159a83bab47b5f4cdad.dblock.zip
Feb 1, 2021 6:11 PM: get duplicati-i3cb0dbfc99d54c82b7131c88a9c584ec.dindex.zip
Feb 1, 2021 6:11 PM: get duplicati-20210201T231149Z.dlist.zip
Feb 1, 2021 6:11 PM: list
Feb 1, 2021 6:11 PM: put duplicati-verification.json
Feb 1, 2021 6:11 PM: put duplicati-20210201T231149Z.dlist.zip
Feb 1, 2021 6:11 PM: put duplicati-i3cb0dbfc99d54c82b7131c88a9c584ec.dindex.zip
Feb 1, 2021 6:11 PM: put duplicati-b0e0f59fc3a764159a83bab47b5f4cdad.dblock.zip

The three get at top are the verification sample. The list below is checking how all the files look.
This is the one I was trying to get you to see to see how the modification time looked after backup.
EDIT: and looking in my database, those files are now Verified, but the file went up before the list.

The hard plan is to keep track somehow of which files have been tested previously, e.g. by names.
You can probably just let records grow forever. If a file gets deleted, you still won’t try checking it…

rick55 · February 1, 2021, 11:56pm

I’d like to stick with WebDAV since I am most familiar with it. Certainly willing to be your eyes/hands to continue debugging.

How do you want to proceed?