Restore stops with "Volume count in Sleepable Dictionary volume table is not zero"

I’m using Duplicati (since six weeks 2.1.0.106_canary_2025-01-11, before one of the betas) for multiple years now as part of a Docker stack (on a Raspberry 5) to backup my smarthome on a Google Drive. Never a problem with backing up or restoring.

Now, for the first time (unfortunately in a really important situation), a restore stops (to be precise: “slows down to complete halt”) with the error message

Volume count in SleepableDictionarys volume table is not zero: 14726 Volumes: -1, 85, 87, 89, 91, 130, 132, 138, 140, 169, 171, 173, 177, 208, 215, 220, 237, 239, 241, 243, 245, 255, 257, 263, 265, 267, 269, 271, 291, 293, 295, 321, 323.

It also shows:
Block count in SleepableDictionarys block table is not zero: 14726

And

Error during file processing
System.NotSupportedException: Stream does not support reading.
   at System.IO.FileStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.IO.Stream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count)
   at Duplicati.Library.Main.Operation.Restore.FileProcessor.<>c__DisplayClass1_0.<<Run>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at CoCoL.AutomationExtensions.RunTask[T](T channels, Func`2 method, Boolean catchRetiredExceptions)

I did not find anything on that message. Does anyone have an idea on what this means and how I can fix it?

Ok, this is odd. Even though I did the most recent update on 2.1.0.106, I could only restore my backups by going back to 2.0.8.1_beta_2024-05-07.

Does this make sense to anyone?

Maybe the developer of new restore, but it’s gotten late for today:

That sounds like an issue with the new restore implementation.
As mentioned, you can switch back to the previous one by setting the advanced option --restore-legacy=true

Maybe @carljohnsen can explain what the issue is.

If so, I think this may be related to “broken” index files that are generated by 2.1.0.2. Due to some quirks, the index files may contain the same blocklist data multiple times. This was tripping up some database recreate as well, so perhaps it is also doing unexpected things with the new restore flow as well.

Thanks!

Since my setting appears to be a pretty normal use case (with a fairly serious bug)…

  1. backups running for years on beta
  2. then updating to the most recent canary
  3. continuing backups on the most recent canary
  4. then restoring after some weeks with that canary

… I am happy to provide details for debugging. Just let me know how I can support.

The bug was pretty scary, though, since NONE of my backups seemed to work, and only out of desperation I went back to the beta version (to restore what was backed up by the latest canary).

I’d be happy to help! Yes, that issue relates to the new restore flow.

When the restore starts, it constructs a plan for which blocks it needs to restore followed by a similar plan for which volumes it needs to fetch. This plan is used for the caching strategy, so that it is aware of which blocks and volumes are needed multiple times thus benefiting from being cached. If they’re only needed once, there’s no need for spending resources on keeping them in their caches. If a block or volume is no longer needed, they are evicted from the block cache (residing in memory) and volume cache (residing on disk) respectively.

The first error that you’re reporting is related to this restore plan; it states that according to the plan, 14726 volumes have not been fully read, followed by the first couple of volume IDs (for debugging purposes). The second statement is very similar, but reports that the block cache have not seen the correct amuont of reads yet. If the restore was successful, both of these counts should be 0. This error is meant as a post restoration sanity check of whether the restoration went according to the plan.

As to why the error occurs, it looks like being due to the FileProcessor crashing when it tried to read a stream:

My guess is that you get the reporting of the caches being incomplete due to one of the internal restore processes throwing an exception, which prematurely shuts down the operation, leading to an incomplete restoration as reported by the caches.

As for a solution to the problem, I need to figure out what happened in the FileProcessor. I’ll get back to you if I need more information.

1 Like

I found the error; it turns out that the file was opened in an incorrect mode when only part of the file needed restoring. The error has been fixed and a test capturing this behaviour has been added in PR 5886 and should be part of a following build. Thank you @covertboard for reporting the bug!

Thanks!

Will try out again soon, since I have by now recovered from the shock that all my backups don’t work anymore. :smiley: