Major issues with verifying backups on Backblaze after upgrade to 2.1.0.4

amloessb · March 3, 2025, 3:16am

I recently upgraded Duplicati from 2.0.8.1 to 2.1.0.4 and am now running into serious issues when Duplicati enters the verification phase of a backup stored on Backblaze (as well as minor issues outside of that phase).

This is a long-extant backup (791.83 GB / 131 versions), which is using the B2 storage type. My other backup (to a local FTP server) does not experience any of the below issues.

A normal B2 backup, prior to upgrading, looks something like this:

2.0.8.1 B2 backup log (with compaction)

And here’s what’s happening now:

2.1.0.4 B2 backup log (with compaction)

I’ve truncated the log after the first two failures, but it continues to retry and fail to get files from B2 until Duplicati is killed (it was still not done as of 09:38 the next day).

My observations of the 2.1.0.4 logs are as follows:

Duplicati has some problems doing the initial upload of the files to be backed up, but does eventually succeed.
Duplicati has zero issues with the compacting process (get/put/delete)
Duplicati fails to verify even a single file after backup and compaction are complete

This backup is currently running again (after I restarted my computer) and it’s following the same pattern as above.

I’m looking for any advice or assistance anyone here can provide. My alternative right now is that I’ll have to downgrade back to 2.0.8.1, since Duplicati just gets eternally stuck on this backup in its current state, and never completes.

ts678 · March 3, 2025, 1:28pm

The main discussion of “task was canceled” is below. Until the mentioned fix gets released, try setting the read-write-timeout to a longer time, or set to 0 if you want to disable the timeout.

amloessb · March 4, 2025, 3:46am

Thanks for the link to the other post. Setting the read-write-timeout option to 0 indeed fixed all of the issues I mentioned:

Successful backup log

I’m a bit puzzled though. In the successful backup Duplicati made with the timeout disabled, only one of the verification transfers (the first one) took longer than 30 seconds. The rest of them were well under the timeout:

Operation	Filetype	Events	Average	80th %ile	Maximum
Get	dlist	32	15.4s	16s	58s
Get	dindex	32	0.6s	1s	1s
Get	dblock	32	14.8s	15s	16s

It seems like there’s something more going on here than the timeout simply being too short.

ts678 · March 4, 2025, 12:49pm

That would require developer comment, however I’d note that you didn’t time the Put requests.

In original failure log, those were also failing. In latest log, first I sampled was over 30 seconds.

I “think” (not sure) it’s not a pure timeout over the whole operation, and that complicates things.

The developer quote I gave mentioned “a much more controlled way” to do the timeout anyway.

Regardless, this is the good news.

kenkendk · March 6, 2025, 9:22pm

It happens because the first connection on B2 will list the bucket. This takes a long time, and this is what triggers the timeout. Duplicati expects the transfers to start immediately, but for B2 it starts by listing the bucket, which may take more than 30s, and then the timeout is triggered.

The timeout code is removed from the latest canary builds and will be re-introduced in a better way later, so the bucket listing is not counted in the timeout. The timeout has also been increased, and a related upload bug has been fixed in 2.1.0.5.

For the GET calls, this is not done with parallel downloads, so you only get one connection here. But uploads are done in parallel, so here you can see multiple +30s requests.

As a side note, we are looking into a way to not perform the bucket listing multiple times with parallel downloads as that is a wasteful approach.

ts678 · March 6, 2025, 9:38pm

Does “first connection” happen again later during the backup, e.g. some bit of code restarted?

In the log, it starts up rough as expected, then works, then later (maybe in verification) breaks:

2025-03-01 19:31:09 -08 - [Retry-Duplicati.Library.Main.BackendManager-RetryGet]: Operation Get with file duplicati-20250222T003003Z.dlist.zip.aes attempt 1 of 15 failed with message: The operation was canceled.
System.Threading.Tasks.TaskCanceledException: The operation was canceled.

EDIT 1:

The normal backup in original post is less messy, and looks like the first file verify takes longer.
Maybe the backend got deleted and restarted, though I’m not sure why that would be required.

EDIT 2:

The TEST command also exists standalone, so maybe backend is also specific for just the test?
If so, that might put a minor wrinkle on the “Add state to backends” plan. How long is state kept?

kenkendk · March 12, 2025, 8:03pm

In the 2.1.0.4 code the logic is not super clean, so multiple instances of the backends are created, and not used. The rest of the code uses a shared manager with its own pool of connections.

For that reason a “new instance” can happen in many places. For 2.1.0.3+ there was a dirty fix so the initial instance used to list the remote files was re-used for uploads (saving a new bucket listing).

For the canary builds, this was rewritten to use a single manager so the instances are reused throughout the entire operation.

For both versions, a failure will cause the backend to be disposed, and a new instance will (likely) be created.

The state is kept by the manager, so state is shared through the entire operation. This ensures that each run is isolated and no “bad state” is carried over.