Constant backup failures - file size was expected to be xxx

Most of my backups fail, thanks to some issue where duplicati has to re-download files because of the downloaded file size was not what was expected. It’ll retry and retry and occasionally it manages to complete, but it fails like 80% of the time.

Like so:
Jun 8, 2019 8:19 PM: Operation Get with file duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes attempt 2 of 5 failed with message: The file duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes was downloaded and had size 17947404 but the size was expected to be 52418413

For some reason duplicati is downloading these files very slowly and apparently it gives up. Check out the times: 1st attempt: 6 minutes until it was “completed” at 29 megs; 2nd attempt 4 minutes until it “completed” at 17 megs, 3rd attempt was fast but again only 17 megs downloaded, 4th attempt succeeded but it took a full 10 minutes!

Jun 8, 2019 8:29 PM: Backend event: Get - Completed: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Started: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Retrying: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Completed: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (17.12 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Started: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Retrying: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:19 PM: Backend event: Get - Completed: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (17.12 MB)
Jun 8, 2019 8:15 PM: Backend event: Get - Started: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:15 PM: Backend event: Get - Retrying: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)
Jun 8, 2019 8:15 PM: Backend event: Get - Completed: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (29.25 MB)
Jun 8, 2019 8:09 PM: Backend event: Get - Started: duplicati-b032286bc49da4cd6bae8291968e4e3f3.dblock.zip.aes (49.99 MB)

The backup files are only 52 megs and I can download them directly from backblaze’s site in a few seconds, so it’s not any actual network issue. I’m struggling to figure out what is going on and what I can do to debug this further.

This has been pretty much always happening since I started using duplicati around 2.0.4.x and I’m currently using 2.0.4.18_canary_2019-05-12. Client is ubuntu 16.04 if that matters.

Does anyone have any ideas or suggestions?

Hello @grue and welcome to the forum!

Got any stack traces that might show more about what’s going on? For occasional use, you can use server logs About → Show log → Live → Retry. Alternatively use –log-file at –log-file-log-level=retry.

There’s lots more that could be tried, but if it gets to network-level debugging it can get very technical.

Since you gave one example of a short file, can you say whether shortness-at-transfer-failure varies?

You could try setting –http-operation-timeout and –http-readwrite-timeout to high minutes, but I haven’t carefully researched whether they apply to B2, AND are respected. Still, it’s an easy experiment to try.

A less easy experiment is to upgrade mono to the latest. That sometimes solves really odd problems.

A side experiment would be to use Duplicati.CommandLine.BackendTester.exe to see how B2 works, using code like the usual Duplicati code. You can also use Duplicati.CommandLine.BackendTool.exe manually to transfer files. Both need a URL-style destination sort of like Target URL in Commandline.

Thanks @ts678 for your suggestions.

I’ve done a bunch of debugging and may have found the root cause. Upgrading mono seems to have resulted in more detailed error messages which gave better clues.

Bandwidth throttling simply doesn’t work, and my upstream is dinky enough that if I’m uploading at full blast, my internet is unusable - even dns queries fail. And I think that’s what’s causing my issues. I had set bandwidth throttling to around 70% of my actual upstream (back when I originally set up my backups, somewhere around 2.0.6.5). Somewhere along the way as I upgraded, throttling stopped working and that must be when my backups started failing.

I installed 2.0.6.18 on separate box (windows 7 64-bit) and confirmed I have the same issue as on my ubuntu server: no throttling (neither with the UI button nor the throttle-upload/download advanced options), and backup logs show retries, sha1 checksum errors, and dns query failures, etc. I downgraded to 2.0.6.5, throttling works, and the problems go away.

I’ll try to find in exactly which version it broke.

Throttling has sort of worked for me, although I did find a way to corrupt backups, and it’s a bursty throttle.

Upload throttle corrupts backup, especially OneDrive. Analyzed, with code proposed. #3787

The current last post there has a reference to parallel uploads (release 2.0.4.16, with a bug fix in 2.0.4.17) defaulting to 4 concurrent uploads, which seems to tickle pre-existing corruption bug better than without it. --asynchronous-concurrent-upload-limit can adjust the setting up and down to see if it affects your throttle.

Throttling sometimes appears bursty in Task Manager. That’s probably OK for fair line sharing, but maybe insufficient if things get so bad in the short term that DNS queries fail. Heavy packet loss can break things. Some routers have bandwidth controls, sometimes including port config, so maybe you can give 53 some help to get DNS through. But there’s also your regular file upload traffic. If you can devise a test with some more ordinary software (maybe like an Internet speed test during upload testing) killing DNS, can your ISP help? My expectation would be that even if one has slow uploads, heavy packet loss should never happen.

If you can get DNS to fail, you might be able to get a direct idea of (a specific type of) packet loss with ping which might also help distinguish between packets never coming back versus packets being very delayed.

I’m assuming the references to 2.0.6.x mean 2.0.4.x, but if throttling got a lot worse in a way not explained by the corruption bug I cited, then there’s more to chase. The bug scrambles files but leaves sizes correct.

Maybe you can post what mono said? Still also hoping for requested Duplicati detail, such as stack traces. Ultimately coming up with a reproducible-by-anyone test case not needing a slow upstream would be ideal (and can be filed as an issue, but one can’t always get that far so sometimes there’s a bit more guessing).

EDIT:
Parallel TCP streams vs single ones explains in technical detail why one may obtain greater throughput by this long-requested feature (in short, there’s a latency issue due to hard-to-avoid things like speed of light), but one phrase in the article jumped out (for your current context) because “fill the pipe” may cause issues for your networking situation (which might not be right) e.g. single-stream might leave gaps for DNS to run:

If a single stream can hit the maximum bandwidth of the link i.e. fill the pipe, then multiple streams won’t help. But if a single stream can’t fill the pipe, then multiple streams will increase throughput.

What an ISP usually sells is capacity in bits per second, but that’s no promise a single stream can get that. Web-based Internet speed tests typically run multiple streams, which is one reason I suggested trying one. See how your DNS fares during the test.

Argh, wrote a whole reply and lost it, so here I go again.

I’ve managed to find a working configuration that avoids the problem. In a nutshell, here’s the full rundown in case it’s helpful, or someone else in the same boat finds this post:

My internet is a bit weird: downstream is about 50MBit/s and upstream 9MBit/s according to various speed tests. Unthrottled transfers to backblaze get around 400 KB/s (= ~3.2Mbit/s) and plain FTP can reach about 600 KB/s (= ~4.8Mbit/s). What I noticed long ago is that uploading at full speed makes my internet unusable, not a single web page will load and even dns queries fail, as mentioned. That’s just a fun feature of my ISP here. I remember this kind of stuff from the early 00s, which I always figured was something like Bufferbloat - Wikipedia. But no idea really.

Up to and including duplicati 2.0.4.15 (tested multiple older versions) unthrottled transfers will render my internet unusable but the backup itself succeeds without issues. When I throttle to 100 KB/s, all is well. The actual transfer rate is around 100-180 KB/s, which is fine, and there’s no noticeable effect on my internet use so I can live with the slow transfers.

In 2.0.4.17 and 2.0.4.18 (didn’t test 2.0.4.16), throttling doesn’t work at all: speeds are unthrottled. I can set it to 10KB/s and observe the transfer going at 400KB/s, my maximum. I observed this also on windows 7. While this previously “only” made my internet unusable but the backups still succeeded, now with the multiple parallel uploads, they will basically all fail nearly always, even after extending timeouts to extremes. So I think the combination of 4 multiple uploads + unthrottled + my weird internet is the fatal combination here.

Plot twist: if I set concurrent uploads limit to 1, throttling starts working again!

So my workaround (until my ISP does something, likely never) is:

  • asynchronous-concurrent-upload-limit=1
  • Max upload speed = 100 KB/s

With these two all is working again. Unless I hit that corruption bug of course :frowning: so I’ve yet to see about that.

I’ve attached logs from three transfers as requested:
duplicati.9.log - 2.0.4.18, throttling at 100KB/s ul / 200KB/s dl. Observed upload rate was 400KB/s, internet was unusable, lots of file retries…(backup cancelled)
duplicati.10.log - 2.0.4.18, throttling at 10KB/s ul / 20 KB/s dl. Observed 400KB/s, internet unusable, file retries…(backup cancelled)
duplicati.11.log - 2.0.4.18, throttling at 100KB/s ul / 200KB/s dl, async=1. Observed ~90KB/s, no issues, successful backup.

Thanks again for all your help.
logs.zip (6.0 KB)