B2: How many times does Duplicati try to get an Upload URL when something goes wrong?

algebro · July 5, 2019, 6:05pm

I recently tried to migrate from ACD to Backblaze b2 and after about 20GB, I started getting “Connection reset by peer” errors from B2 and the backup job failed. I reached out to Backblaze support and they said the upload servers fill up and that you have to request a new upload URL and try again to fix it. This explanation is consistent with the documentation. This was particularly bad because it seemed to break my local database and I had to blow it all away to start over.

My question is, how resilient is Duplicati when b2_upload_file fails? The documentation recommends trying at least 5 times, but peeking through the source code it looked like it might have only tried once. Granted, I didn’t do an exhaustive review.

Thanks!

ts678 · July 5, 2019, 7:03pm

Hello @algebro and welcome to the forum!

–number-of-retries (default 5) –retry-delay (default 10 seconds) retry at a higher level than the individual backend. I’m pretty sure it gets a new upload URL each time, but if you see otherwise then that’s a bug.

Backblaze B2 ideally would like Duplicati to look for their Retry-After header (I don’t think it does that), and if that’s not available then do an exponential backoff as described below (with “expotential” as typo):

B2 Integration Checklist. Section technically is referring to 503 error, but a backoff plan might fit anyway, given that “something goes wrong” can happen other ways. Duplicati doesn’t do “exponential” yet AFAIK.

What release are you running? Here are some of the B2 glitches I hit during a big test of 2.0.4.22 canary where you can see the retry message. Use About → Show log → Live → Retry, or set up –log-file and –log-file-log-level=retry, or for the simplest check, look for your log’s BackendStatistics" “RetryAttempts”.

Typically Duplicati seemed able to ride out B2 errors transparently (unless you’re looking at logs closely) until retry attempts were exhausted, then it bailed out quite hard. I’m not sure if that’s a new or old bug…

Reporting options –send-mail-to worked, but the job database never got updated to record an error there. After a few tries, the backup would always sort itself out and keep going though. What errors did you get?

There seemed to be a special noisiness with parallel uploads when a dlist upload would fail and Duplicati would try to clean up by deleting it, only to find that it wasn’t there. This was avoidable by disabling parallel uploads with --asynchronous-concurrent-upload-limit=1 (only in canary and experimental, not yet in beta).

algebro · July 5, 2019, 7:17pm

Thanks a lot for the detailed reply! I am still figuring out how to effectively generate and dig through Duplicati logs so unfortunately I can’t provide a lot of information about that (I also nuked the backup DB and restarted it this morning, so I think that info is gone).

I’m running 2.0.4.22_canary_2019-06-30. I’ll keep an eye on the Retry logs and let you know if I see anything crazy. How bad of a performance hit did you get when you disabled parallel uploads?

ts678 · July 5, 2019, 7:32pm

I wasn’t paying close attention. My cable modem typically gets 6-some Mbits/sec, whether parallel or not. People who have high bandwidth might appreciate the benefits of the new parallel uploads more than I do. Windows Task Manager is what I use to watch the short-term speeds (and it varies moment-to-moment).

If you search for RetryPut in the log, you’ll see the retries and what failed to cause the retry. I won’t post everything just yet, but I counted 124 in 4 hours of roughly 24x7 10-minutes-apart rather-small backups… Some of these may have been network glitches (I even got a DNS failure) and other flaky-network issues.

algebro · July 5, 2019, 11:41pm

Thanks! One more question–when the backup failed my database got incredibly mangled and trying to repair it seemed to make things a lot worse, so I ended up having to nuke the bucket and local database and start over. Is the duplicati database more vulnerable during the initial backup or is this likely to happen again if I run into issues with B2 resetting the connection in subsequent incremental backups?

ts678 · July 6, 2019, 12:33am

The backup in general is more vulnerable until the first one has completed, and the dlist file reflecting its contents has been uploaded. Without a dlist, some operations that might normally work might not work.

Getting at least a stake put down with a small initial backup would ensure you have initial dlist uploaded.

Beyond that, I can’t say much, knowing nothing about what sort of mangling you saw. Reporting issues, especially on canary releases which tend to have new code and associated issues, can be very helpful. There’s been worry about how well new parallel upload code handles errors (but there are old bugs too).

Reproducing the problem, e.g. by deliberately causing errors or setting --number-of-retries low, can help understand how to resolve bugs. Logs and maybe database bug reports (currently somewhat broken…) would help understand the steps Duplicati took to get to the problem, and you can give the manual steps.

But getting back to your upload, if you think you might have a broken database again, you can back it up manually after completing some increment of backup. Especially if you set –no-auto-compact, a backup won’t change files already uploaded, but just add more. So if disaster strikes, you roll everything back…

I think just putting the old database back can have it delete the new unknown backend files, which hugely offends people who didn’t want that, e.g. they restored an older DB from an image backup or something:

Repair command deletes remote files for new backups #3416

I don’t know of a good way to study that, however you might see other errors, as my backup was seeing. For increasing your chance of success, make sure your network connection seems to be working nicely, for example by seeing if an Internet speed test runs as fast as you expect, then raise --number-of-retries, perhaps also raising --retry-delay to try to approximate where exponential backoff would go, if it existed…

algebro · July 14, 2019, 1:26am

Thanks a lot for the help. The initial backup completed but I’m getting the following warnings/errors every backup run:

LimitedWarnings: [
2019-07-13 21:15:36 -04 - [Warning-Duplicati.Library.Main.BackendManager-DeleteRemoteFileFailed]: Delete operation failed for duplicati-20190714T010000Z.dlist.zip.aes with FileNotFound, listing contents
]
LimitedErrors: []
Log data:
2019-07-13 21:15:36 -04 - [Warning-Duplicati.Library.Main.BackendManager-DeleteRemoteFileFailed]: Delete operation failed for duplicati-20190714T010000Z.dlist.zip.aes with FileNotFound, listing contents
Duplicati.Library.Interface.FileMissingException: The requested file does not exist
at Duplicati.Library.Backend.Backblaze.B2.Delete (System.String remotename) [0x000e4] in <5d55b335d43a41fbb92b6c4caaca525f>:0
at Duplicati.Library.Main.BackendManager.DoDelete (Duplicati.Library.Main.BackendManager+FileEntryItem item) [0x0001c] in <63172ad53f6940c395d5db05d1717049>:0

Any idea what this means? I’ve tried repairing the database but I’m still getting the errors.

algebro · July 14, 2019, 4:52pm

Well, I tried a “delete and recreate” operation on the database and that may have either exposed more problems or caused them, I’m really not sure. There are 55 errors saying

 "2019-07-13 23:54:52 -04 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-MissingFileDetected]: Remote file referenced as duplicati-b2c5ce3023a794f659ad010a80165cf48.dblock.zip.aes by duplicati-i0131ce6b1b5a44c3a0b19030217cc4f8.dindex.zip.aes, but not found in list, registering a missing remote file",

At this point I’m wondering if I should even continue trying with Backblaze or give up since there seem to be so many issues with keeping the data consistent. Do the errors I’ve mentioned here and in the previous post mean the backup set is compromised?

ts678 · July 14, 2019, 5:56pm

Such out-of-context errors can’t be reliably interpreted. One would want either a log-file or a live log to get more. Typically, though, the deletes during a backup are of files that failed during upload. The retry gets a different name. For dindex and dblock, it’s a new semi-random name. For dlist, seconds is incremented, so maybe on B2 you’ll find duplicati-20190714T010001Z.dlist.zip.aes which replaced the original attempt.

I also talked about logging and retries earlier. The worrisome possibility is that with us both on 2.0.4.22, it could be that “newer” got worse. Would you be able to test B2 with a new backup but using 2.0.4.5 beta?

The deletion failures might just be an attempt to clean up for a retry, although I think I’ve seen it try delete without first looking (by list) to see if there’s a file there. If you can actually supply logs, we can see more.

The missing remote file is more worrisome, especially dblocks which can represent actual missing data. Loss of a dindex or dlist can sometimes be repaired if one has a good database copy with such records.

Testing an earlier release would be very helpful, so we’ll have an idea of whether the newer one is worse. Watching retries and other errors would also be helpful to detect issues “hidden” by the retry mechanism.

algebro · July 14, 2019, 8:37pm

I’ll try again with 2.0.4.5 beta, but it will be awhile before we see the results since the initial backup takes close to a week. I can try with a smaller sample but I don’t know if it will be that comparable with the initial backup.

I noticed several retries during the initial backup, but I believe in all cases, the upload succeeded after 1 retry and was successfully renamed.

algebro · July 15, 2019, 2:43pm

Well, after deleting and creating the database and getting all those errors, my nightly backup completed successfully last night with no errors or warnings, though I’m not sure whether to trust it. Do you think it’s possible the 55 missing files errors were due to retry/rename operations during the initial backup and the old filenames didn’t get cleared out?

ts678 · July 15, 2019, 6:54pm

So you took the database that was throwing all the errors during Recreate and just tried a backup anyway?

Sometimes Duplicati does seem to be able to continue backing up, for example in the test I was running, it eventually got back into its scheduled operating rhythm. What’s not so easy to check for is hidden damage.

Unfortunately B2 charges for downloads (uploads are free). The best proof of good backup is good restore using –no-local-blocks=true to make sure it won’t optimize by restoring from disk-resident blocks as it can.

If you still have the list of missing files, e.g. a dblock, you could try using the affected command to see what source files are affected by it being missing. There’s a lab example here on dealing with damage situations. That also talks about the list-broken-files command to see whether it finds anything it wants to complain on.

algebro · July 15, 2019, 9:45pm

So you took the database that was throwing all the errors during Recreate and just tried a backup anyway?

It’s hard to tell on the internet whether that’s an honest question or sarcasm, but in either case, I’m going to take that as “no, I shouldn’t trust it”

I think a full restore will only cost a few bucks with ~300gb of data, so I’m willing to try that if you’re reasonably confident that would eliminate all doubts about the backups. I’m hoping the errors were just some initial wonkiness and that they’re ironed out now but I’m trying another recreate operation to see if I’m still getting the errors.

Thanks again for all your help, by the way.

ts678 · July 15, 2019, 11:16pm

Lots of people just try again. My scheduled backup test did that automatically without me. There is actually some repair work possible when each backup starts, but repair (and recreate too) are less than perfect… Trying again is popular (though usually it’s not right after a bad recreate…). I’m just not sure how safe it is.

Eliminating all doubt is tough because the backups build on each other, however something like restoring the latest at least shows you can do that one, so maybe you’re fine going forward, and the latest probably shares blocks with old versions. How the backup process works is the technical explanation, but basically changes get uploaded, and non-changed blocks just point to old upload data that already has such blocks.

I think restore is quite careful to make sure that the restored file hashes match original hashes at backup, however there are also tools available that can compare trees, but studying changes could become tiring.

Your 2.0.4.22 has a more efficient Recreate. Presumably you’d notice it, if it was downloading all 300 GB?

You could also (before compact decides it’s time to rearrange files) look at backend files to see if you see signs of retry weirdness, e.g. if you have two dlist files with names one second apart, or dblocks with size that’s not 50 MB, or whatever your remote volume size is. A short one at the end would be normal though.

I’m sure hoping there’s not a delay (and plausibly there could be one) where an upload fails, Duplicati tries delete immediately, delete fails because the file’s not there, then the file shows up. B2 isn’t a simple write operation into a file (same might be true for other cloud services though). File needs to be processed first.

B2 Resiliency, Durability and Availability talks about this, and I suspect it all happens AFTER upload ends.

I guess there’s also a chance that someone else will show up with a solid and simple way to tell all is fine. That would certainly be something one would like in a backup program. I’m just not certain it’s done yet…

Sorry for being indefinite, but authoritative answers require a deep understanding of the code which I lack.

algebro · July 16, 2019, 2:31pm

Is there a way to restore with --no-local-blocks from the web UI? I’m not seeing any boxes for additional options under the restore workflow.

I see the option under general settings, but I wanted to make sure it doesn’t affect other operations (like backups) in the meantime.

edit: nevermind, I just decided to set the general option and cancel my scheduled backups for the time being.

ts678 · July 16, 2019, 3:44pm

For a direct restore (although for a true disaster recovery there would be no local blocks to worry about…) screen 2 on Encryption has a text box where you can type --no-local-blocks and other restore options.

And I think I just found a bug in 2.0.4.22 (and probably earlier) using the “restore from configuration” option. Assuming it’s a real bug, thanks for asking the question and forcing a trial of a less-frequently-used spot…

Perhaps you could export a backup configuration, then see if you can get it to work without lots of popups? Meanwhile I guess I’ll go test some other versions to see how they do. I found no obvious prior reports of it.

EDIT: 2.0.4.5 beta does lots of popups too, so I think this is an old problem, maybe specific to options. Log:

   Warnings: [
        2019-07-16 13:12:38 -04 - [Warning-Duplicati.Library.Main.Controller-UnsupportedOption]: The supplied option ----allow-full-removal is not supported and will be ignored,
        2019-07-16 13:12:38 -04 - [Warning-Duplicati.Library.Main.Controller-UnsupportedOption]: The supplied option ----snapshot-policy is not supported and will be ignored,
        2019-07-16 13:12:38 -04 - [Warning-Duplicati.Library.Main.Controller-UnsupportedOption]: The supplied option ----no-encryption is not supported and will be ignored

I sort of recall seeing how-many-dashes issues before. Sorry for this diversion from the main topic here…

algebro · July 16, 2019, 3:56pm

And I think I just found a bug in 2.0.4.22 (and probably earlier) using the “restore from configuration” option. Assuming it’s a real bug, thanks for asking the question and forcing a trial of a less-frequently-used spot…

Interesting, glad I could help.

Unfortunately, Duplicati seemed to hang when I tried to cancel my restore to rerun it with the correct options, and after killing the process and restarting Duplicati, all of my configurations are now gone. Have you ever seen that happen?

I tried to add the backups back from .config/Duplicati, but I never exported the configurations and I only see the SQLite databases. Are my backup configurations hosed?

Edit: scratch all of that…I forgot that when starting duplicati manually you have to pass --user to systemctl

algebro · July 16, 2019, 6:56pm

So I’ve started the test backup with a direct restore and as soon as the database was recreated, I got the “56 missing files” error, although the restore is still continuing, which I find surprising if there is (evidently) irrecoverable damage to the remote database. I’m continuing to let the restore run but I’m not really sure what I should do about the backup database, since I’ve tried everything I know of to get Duplicati to fix it.

I know it’s hard for you to give direct answers to a lot of this, but what would you personally do in this situation? If the restore doesn’t throw any errors and bomb out due to bad hashes, would you just continue running jobs even though any time the backup db is repaired/recreated it complains about 56 missing files? Would you nuke it all and start over? My concern with that approach is that whatever caused the inconsistencies the first time will just happen again.

Thanks for all your help troubleshooting this!

ts678 · July 16, 2019, 7:43pm

My case might be atypical. I generally advise people not to get into a spot where having to restart backup would be a great loss, so I use Duplicati (still in beta…) for short-term recoveries and disaster protection.

Typically I just restart again if there’s any question. This also lets me save bad backups for analysis when motivation happens and time is available. Typically digging is hard, but easier on my own than on others’, partly because I keep a detailed Profiling log running so there’s a chance of seeing how breakage occurs.

Restart goes well with B2 and its free uploads. If I was paying for upload I’d be somewhat more reluctant.

Specifically for this case, I’d run some of the integrity tests that I’ve mentioned, and then make the choice.

Recurring problems would be annoying. “Unexpected difference in fileset” test case and code clue #3800 was one that was causing me repeated backup start-overs until finally I had some time to track it down…

While I’m still wondering if 2.0.4.5 does better on B2 than newer versions, reducing damage from further Backblaze errors could be done as suggested, e.g. with more retries, and longer delays between retries. OneDrive also throws errors at me sometimes. I can’t easily tell what’s them and what’s network issues.