I recently upgraded Duplicati from 2.0.8.1 to 2.1.0.4 and am now running into serious issues when Duplicati enters the verification phase of a backup stored on Backblaze (as well as minor issues outside of that phase).
This is a long-extant backup (791.83 GB / 131 versions), which is using the B2 storage type. My other backup (to a local FTP server) does not experience any of the below issues.
A normal B2 backup, prior to upgrading, looks something like this:
I’ve truncated the log after the first two failures, but it continues to retry and fail to get files from B2 until Duplicati is killed (it was still not done as of 09:38 the next day).
My observations of the 2.1.0.4 logs are as follows:
Duplicati has some problems doing the initial upload of the files to be backed up, but does eventually succeed.
Duplicati has zero issues with the compacting process (get/put/delete)
Duplicati fails to verify even a single file after backup and compaction are complete
This backup is currently running again (after I restarted my computer) and it’s following the same pattern as above.
I’m looking for any advice or assistance anyone here can provide. My alternative right now is that I’ll have to downgrade back to 2.0.8.1, since Duplicati just gets eternally stuck on this backup in its current state, and never completes.
The main discussion of “task was canceled” is below. Until the mentioned fix gets released, try setting the read-write-timeout to a longer time, or set to 0 if you want to disable the timeout.
I’m a bit puzzled though. In the successful backup Duplicati made with the timeout disabled, only one of the verification transfers (the first one) took longer than 30 seconds. The rest of them were well under the timeout:
Operation
Filetype
Events
Average
80th %ile
Maximum
Get
dlist
32
15.4s
16s
58s
Get
dindex
32
0.6s
1s
1s
Get
dblock
32
14.8s
15s
16s
It seems like there’s something more going on here than the timeout simply being too short.
It happens because the first connection on B2 will list the bucket. This takes a long time, and this is what triggers the timeout. Duplicati expects the transfers to start immediately, but for B2 it starts by listing the bucket, which may take more than 30s, and then the timeout is triggered.
The timeout code is removed from the latest canary builds and will be re-introduced in a better way later, so the bucket listing is not counted in the timeout. The timeout has also been increased, and a related upload bug has been fixed in 2.1.0.5.
For the GET calls, this is not done with parallel downloads, so you only get one connection here. But uploads are done in parallel, so here you can see multiple +30s requests.
As a side note, we are looking into a way to not perform the bucket listing multiple times with parallel downloads as that is a wasteful approach.
Does “first connection” happen again later during the backup, e.g. some bit of code restarted?
In the log, it starts up rough as expected, then works, then later (maybe in verification) breaks:
2025-03-01 19:31:09 -08 - [Retry-Duplicati.Library.Main.BackendManager-RetryGet]: Operation Get with file duplicati-20250222T003003Z.dlist.zip.aes attempt 1 of 15 failed with message: The operation was canceled.
System.Threading.Tasks.TaskCanceledException: The operation was canceled.
EDIT 1:
The normal backup in original post is less messy, and looks like the first file verify takes longer.
Maybe the backend got deleted and restarted, though I’m not sure why that would be required.
EDIT 2:
The TEST command also exists standalone, so maybe backend is also specific for just the test?
If so, that might put a minor wrinkle on the “Add state to backends” plan. How long is state kept?
In the 2.1.0.4 code the logic is not super clean, so multiple instances of the backends are created, and not used. The rest of the code uses a shared manager with its own pool of connections.
For that reason a “new instance” can happen in many places. For 2.1.0.3+ there was a dirty fix so the initial instance used to list the remote files was re-used for uploads (saving a new bucket listing).
For the canary builds, this was rewritten to use a single manager so the instances are reused throughout the entire operation.
For both versions, a failure will cause the backend to be disposed, and a new instance will (likely) be created.
The state is kept by the manager, so state is shared through the entire operation. This ensures that each run is isolated and no “bad state” is carried over.