Very large backup to B2 failing with "GetResponse Timed Out"

For several weeks I’ve successfully used Duplicati (currently running 2.0.2.1_beta_2017-08-01) on Linux to backup a large (currently 1.13 TB) dataset to Backblaze B2. I am using a 50 MB block size. The initial backup and a few incrementals completed successfully.

After a week or so, I added another large directory to my backup set, bringing the total size to around 1.6 TB. My next backup (unsurprisingly) took several days, but it eventually failed with a ThreadAbortException. That’s not the problem, though. The problem is that I can no longer backup at all.

Each attempted backup now runs for almost exactly 10 minutes with the message “Verifying backend data”, and then fails with the error message below. Can Duplicati simply not handle this size data set? Would I be better off breaking my backup into a handful of smaller (say 500 GB) backups? Would increasing my block size help? Any other suggestions?

System.Net.WebException: GetResponse timed out ---> System.Net.WebException: Aborted.
  at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) <0x41599aa0 + 0x001a3> in <filename unknown>:0 
  at Duplicati.Library.Utility.AsyncHttpRequest+AsyncWrapper.OnAsync (IAsyncResult r) <0x4158ee70 + 0x000eb> in <filename unknown>:0 
  --- End of inner exception stack trace ---
  at Duplicati.Library.Main.BackendManager.List () <0x415de000 + 0x0016b> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.FilelistProcessor.RemoteListAnalysis (Duplicati.Library.Main.BackendManager backend, Duplicati.Library.Main.Options options, Duplicati.Library.Main.Database.LocalDatabase database, IBackendWriter log, System.String protectedfile) <0x415dacd0 + 0x0015f> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList (Duplicati.Library.Main.BackendManager backend, Duplicati.Library.Main.Options options, Duplicati.Library.Main.Database.LocalDatabase database, IBackendWriter log, System.String protectedfile) <0x415d87e0 + 0x000cb> in <filename unknown>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify (Duplicati.Library.Main.BackendManager backend, System.String protectedfile) <0x415d76a0 + 0x001af> in <filename unknown>:0

My guess is you’re running into too many archive files for your destination and bandwidth issues. As a test, try adding --no-backend-verification to disable the pre-run check that the remote file list matches the local one.

Depending on how that works we can look at options.

It sounds like you are hitting the response timeout of 10 minutes when fetching the list of remote files.

In the canary builds I have added a new option that can change the timeout value: --http-operation-timeout=20m.

I have also added the option --b2-page-size=100 that allows you to change the number of files returned in each request.

I think this usually helps to have smaller independent backups, as there is less overhead when something needs to be fixed.

But I think the problem here is the number of remote volumes. You can try to change the volume size to something like 200MB, which will reduce the number of remote files.

Assuming default --small-file-size, --no-auto-compact, --small-file-max-count, etc, if volume size is changed from 50MB to 300MB will that trigger a re-download / uncompress / re-compress / re-upload of the existing dblocks because their 50MB size is now 17% smaller than the archive size making them “wasted” files?

If so, it might be worth warning users to expect some bandwidth consumption associated with such as change. And if not, would a larger dblock size help much in this scenario as the user has already hit the limit?

Actually, I was wondering if:

  1. the response timeout could be dynamic (we’re expecting more than X files at the destination so we should increase the timeout)

  2. the UI should somehow let the user know it’s working on something long (like “fetching a list of your bazillion files from the destination…this could take a while”)

  3. would shifting destination dblock and dindex files into individual “dblock” and “dindex” subfolders improve this situation as they’d be separate calls that could EACH take 10 minutes? (And yes, I realize this wouldn’t be a fix so much as just pushing to number-of-files-to-timeout further down the road)

Good point. IIRC, the limit is 5%, meaning that the files must be 5% of the 300MB (= 15MB) or less to be considered as “small”.

No, it would not “fix” the backup, as the timeout still happens. I was thinking more in the “if I start a new backup” scenario.

That is very difficult to do, due to variances in network, server, etc.

Yes, there is some WiP on using pagination which would allow a more exact report on how many files are found, before they are all loaded.

Yes, splitting into subfolders would also mitigate the problem, at the expense of always having to do multiple calls on small backups.

Now that I consider the sub-folder idea (for the millionth time), I am thinking that we could do it more dynamic. Until we have reach some file count limit (e.g. 1000), all files are stored in the folder as normal. Once we go above that limit, new files are put into subfolders.

This makes the listing/query system slightly more complicated but also caters to both having flat lists for small backups, and a hierarchy for larger backups.

The algorithm can scale a lot by gradually taking a bit of the prefix and introducing sub-sub folders. We can also make the initial verification faster by only checking a (random) subfolder.

If we store the relative path to the remote volume, we do not need to do any guessing as to what folder to look in when we need the file.

1 Like

I didn’t want to mention it before, but now that you bring it up… :slight_smile:

Thank you so much for this quick and thorough response. I’m really impressed by the community here, as well as by Duplicati itself.

It sounds like I’ll generally have a lot more flexibility if I break up my backup into a few separate parts, and it’s very easy for me to do so. Additionally, for some of the larger backups (one of them will still be about 750 GB all by itself), it sounds like I’d get some benefit from increasing my block size, perhaps to 200 MB. If I correctly understand how Duplicati works, this should have very little downside as long as my data doesn’t change very much. (In this case, it indeed does not.)

Does this sound like a sensible plan? It’s easy enough for me to try it, and I’m happy to experiment a bit.

And, in the long term, it’s great to hear that there are some options coming that will support larger B2 backups.

Thanks!

Incidentally, when I go with the “multiple backups” approach, is there a way to schedule them to all run every night, one after the other? For instance, if I schedule three different backups for 2 am, will Duplicati just run them one at a time, starting at 2 am?

Yes, you should get some benefit from that. You might also want to take a look at this:

Kind of. Here are the methods I can think of:

  1. Schedule them as slightly overlapping (when a job is scheduled to start, if there’s one already running the new one gets added to a queue and will start when the previous on finishes)

  2. Use an external script to call the exported command line for each job in order (extra work exporting the command lines after any changes)

  3. Use a --run-script-after command to fire off the next backup via command line (same issue as #2 - plus you have to build your script correctly or if one backup fails, all remaining ones will not get started)

Thanks, this is very useful information. Your first workaround (scheduling the jobs as slightly overlapping) sounds like it will do everything I need. In fact, is there any harm in just scheduling them a minute apart?

I’m not sure where exactly the flag to say “hey, I’m running now” is set vs. the “is anybody running right now” check so it’s probably safer to give it 5 min. separation, just in case.

I hope the check is correctly set that it could never cause a conflict that two jobs are running at the same time. :slight_smile:

I have multiple backups on a second desktop with different intervals so sometimes they have the same next run time. Untill now no conflicts so even setting them at the same time isn’t a problem as far as I’ve noticed.

1 Like

I haven’t heard of any issues like that as long as the jobs are all running on the same server (service or tray icon). I’m just a little but paranoid (in case you hadn’t seen that yet). :crazy_face:

Note that if multiple servers are running (service + tray or user1-tray + user2-tray) then two jobs could run at the same time. But unless advanced parameters (such as dbpath) have been set to point to the same paths they should happily run concurrently.

2 Likes

Just thought I’d follow up with my (very positive) experiences over the past few months.

Following the advice on this thread, I chopped up my backup set into several smaller sets, none of them larger than 750 GB. (Most are between 100 GB and 400 GB.) This was easy to do, since my dataset is already organized into a handful of top-level groupings.

I scheduled all of the sets to run at exactly the same time each day (1 am). Duplicati has no trouble sequencing the backups one at a time, as confirmed by the logs.

For the past two months, these backups have been reliably completing, except for the occasional network hiccup that I’m not too concerned about. (When there is a hiccup, the next run has always recovered without complaint.) I’ve done a couple partial restores to spot-check the backup integrity.

Everything looks great. Thank you for an excellent software package, and for your support. I sent a PayPal donation a month ago, to support all the great work you’re doing.

2 Likes

I’m glad to hear things are working well for you - and thank YOU for using and supporting Duplicati! :smiley: