Restore stopped with database or disk is full database or disk is full

I am having a lot of trouble restoring a data drive with about 6 TB of files on Windows 11. My C: drive was not affected and so my local database from my routine backups was in pristine condition. During my first attempt at a restore to an empty data disk, essentially nothing was saved to the data disk after a dozen hours or so… one or two very large files. I discovered the bug that I reported here where the download function goes rogue:

Dozens of GB of data were showing up in C:\Users\Admin\AppData\Local\Temp, with dup-whatever file names.

After wrangling with that issue, I read about several other (related) issues with Duplicati restore and determined that I should stop using the “new and improved” restore and revert to the old restore. The new restore has several major drawbacks, the most obvious one is that the speed plummets to a crawl as it demands enormous amounts of volume downloads from the backup, in my case a USB drive, separate from the thrashing issue that I reported. (The way I understand it is that the new restore doesn’t re-use the downloaded volumes efficiently.)

Frustrated with the slow progress of the new restore method, which was able to store only about 300 GB of large files in the span of 12 hours, I used my secondary backups.

Windows backup/restore and plain copy-pasting were several magnitudes faster than Duplicati.

So now my 12 TB data drive needs another 3 TB from the backup. Ok, fine, so I re-ran Duplicati to restore from the USB drive as before, but this time with the old restore enabled. Basically, I just needed Duplicati to clean up my drive and put the missing files there.

You know what it did? It sat there and did no reading of the USB drive, it began to check the contents of the data disk. I watched live logs to monitor its progress deciding which files were and were not already good.

So, 6 hours of reading and comparing files on my data drive to the Duplicati database on drive C: and then, while I was away, it logged:

Error: database or disk is full database or disk is full (I assume the local database)

But Drive C: has 600 GB free. And my data drive isn’t full, either, it has TBs of room. What is full?! Now I didn’t catch the error during its occurrence, so perhaps there was a temporary bloating of this amount. I was using standard logging. I can’t imagine a database that large.

Note: my Blocksize is 1 MB. Server db is 144 KB. Local db is 4.7 GB. There is a “lock_v2” file present in the control directory.

Where do I go from here? Do I “vacuum” it? Do I “repair” it?

The log file is pasted below. There doesn’t seem like any way that I had a full hard drive.

code = Full (13), message = System.Data.SQLite.SQLiteException (0x800007FF): database or disk is full
database or disk is full
at System.Data.SQLite.SQLite3.Reset(SQLiteStatement stmt)
at System.Data.SQLite.SQLite3.Step(SQLiteStatement stmt)
at System.Data.SQLite.SQLiteDataReader.NextResult()
at System.Data.SQLite.SQLiteDataReader..ctor(SQLiteCommand cmd, CommandBehavior behave)
at System.Data.SQLite.SQLiteCommand.ExecuteReader(CommandBehavior behavior)
at Duplicati.Library.Main.Database.LocalRestoreDatabase.LocalBlockSource.GetFilesAndSourceBlocks(IDbConnection connection, String filetablename, String blocktablename, Int64 blocksize, Boolean skipMetadata)+MoveNext()
at Duplicati.Library.Main.Operation.RestoreHandler.ScanForExistingSourceBlocks(LocalRestoreDatabase database, Options options, Byte[] blockbuffer, HashAlgorithm hasher, RestoreResults result, RestoreHandlerMetadataStorage metadatastorage)
at Duplicati.Library.Main.Operation.RestoreHandler.DoRunAsync(IBackendManager backendManager, LocalRestoreDatabase database, IFilter filter, CancellationToken cancellationToken)
at Duplicati.Library.Main.Operation.RestoreHandler.RunAsync(String[] paths, IBackendManager backendManager, IFilter filter)
at Duplicati.Library.Main.Controller.<>c__DisplayClass23_0.<<Restore>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Duplicati.Library.Utility.Utility.Await(Task task)
at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Func`3 method)
at Duplicati.Library.Main.Controller.Restore(String[] paths, IFilter filter)
at Duplicati.Server.Runner.RunInternal(Connection databaseConnection, EventPollNotify eventPollNotify, INotificationUpdateService notificationUpdateService, IProgressStateProviderService progressStateProviderService, IApplicationSettings applicationSettings, IRunnerData data, Boolean fromQueue)

I did test a very small backup of around 300 GB and a few thousand files. It worked. I watched the database folder, which added a few temporary db files of a few MB. The TEMP folder was bringing in volumes and unpacking them or whatever there. The whole process was reasonably fast - about an hour. That’s about 70 MB/sec I guess. I didn’t allow it to do any verification.

The only major changes I did from the failure was this in the Config Options tab:

  1. I manually selected two folders known to have large files and several files that did not exist on the data volume. And several files that were already there. This made the restore 10% of the data size of the failed one.
  2. I shrunk restore-cache-max from 24 to 16 GB, which is still higher than the 4 GB default.
  3. I created a logfile on a random extra drive and gave it “information” level logs. I verified afterward and this log looked fine in the new location, as well.
  4. And I told it to use a 30-day retention.

Am I to assume that a full restore is impossible? I did not touch the database at all. I have read several stories here of how slow those processes are.

Also, of note is that I was unaware of changes I may have made a long time ago. These were active on both Restore attempts…

?short-timeout=75s&read-write-timeout=90s&list-timeout=20m

I found that by looking at Advanced… Commandline and Target URL. I vaguely remember this USB drive (or one like it) giving me fits if I didn’t do those settings. I am not sure if those are relevant to a restore.

Hi @Nosenko !

I’m the author of the reworked restore flow. I’m sorry you’re experiencing issues with it.

My first guess as to why “nothing is saved to disk” is that it’s related to volume cache management. The current stable release (2.2.0.3) has some issues related to cache management that leads to volumes being downloaded multiple times (see this discussion). The current default cache size value is 100 volumes (or dblock). E.g. the default being 50 MB volumes means a cache size of 5 GB. You can tune this using --restore-volume-cache-hint (e.g. --restore-volume-cache-hint=200GB to set the cache hint to 200 GB). When this cap is reached, volumes are evicted from the cache based on least recently used. However, since the volumes may still be in use, the actual disk space used by the cache may temporarily exceed the hint until the volumes are no longer in use and can be deleted.

The problem arises when files are scattered across many volumes and the restore order is unlucky, leading to volumes being prematurely evicted, thus requiring them to be downloaded again. Given your default 12 downloaders, I’d assume that the other restore concurrency parameters (--restore-file-processors, --restore-volume-decompressors, and --restore-volume-decryptors) are also 12, meaning 12 files are trying to restore concurrently. This may flood with block requests, especially given a slow source.

Are you also restoring to the external drive or hard drive? If so, try to set the --restore-file-processors to 1, as that will also ensure that only one file is being restored at a time.

This is expected behavior. The restore process first checks the local file system to determine which files are already present and match the expected size and timestamp. This is done to avoid unnecessary downloads of files that are already available locally. If a file is found to be missing or does not match the expected attributes, it will be marked for restoration, and the process will proceed to download the necessary volumes from the backup source. If you don’t want this behavior, you can set the options --restore-with-local-blocks=false to not use local blocks, and --overwrite=true to overwrite existing files. I can see that the new restore flow still checks the target files, which it could skip if these options are set, so I’ll look into that for a future update.

I haven’t encountered this issue before, but my guess is that it’s a SQLite error. From reading the documentation (Result and Error Codes and Temporary Files Used By SQLite (2.6 and 2.7)) I see two cases where this might happen:

  1. The temporary directory used by SQLite is full. Is your %TEMP% directory on the C: drive? If so, this shouldn’t be the issue since you have 600 GB free. Are your %TEMP% and %TMP% environment variables pointing to valid locations with enough free space? Have you set the SQLITE_TMPDIR environment variable? If it’s set to MEMORY, then SQLite will use an in-memory database for temporary storage,
  2. The temporary tables created during the legacy restore are too large. The query from your log line is a SELECT DISTINCT and ORDER BY, which means that it has to create the entire result set before it can filter it down to the distinct values only and then sort the full materialization of the distinction. I don’t see it reaching 600 GB, but it may hit some internal page limit.

Note that this is a legacy restore issue, the new restore flow doesn’t use this query at all.

This is a configuration for the new restore flow, so it won’t affect the legacy restore. This parameter controls how many blocks are kept in memory during the restore process.

No, I agree with you: it should definitely be possible.

I’ll try to dig out an old external USB drive in the coming days and see if I can trash it to reproduce the issue.

I am restoring to a hardware RAID-6 drive using an external USB drive for backup. My restore needs came up due to a RAID-6 triple drive failure, so I had to start from a bare formatted drive. I have never had to restore this much data from a Duplicati backup before.

You made some poor decisions as to use cases then and configurations that are not the default. I was not aware of the 50 MB volume size, but I started my backups a few months ago at 500 MB, then once the baseline data was stored in these mostly unchanging volumes, I switched to 200 MB. That appears to collide with the new restore’s logic quite noticeably.

Perhaps it did run out of space on that attempt. I do not know.

When I re-ran the restores with smaller requests (roughly 1 and then up to 2 TB), no errors were thrown and it seemed like it ran normally. It really increased my babysitting time and effort, trying to get work done while I peered at the logs so I would at least know if/when things were going bad.

The new restore is just not working right with the two bugs I have found, and perhaps this third one you admitted to. If someone doesn’t modify the --restore-volume-cache-hint= setting, they may be toast if their volumes are larger than expected. I think you should immediately re-think and rewrite the code that makes an assumption as to volume sizes. There are many resources about Duplicati, which I have been using for most of a decade, where volume sizes are discussed and suggestions are made to change them. I wanted to keep the number of files low on my external drive to improve efficiency, but I had no idea it would make restores crawl or halt.

My external drive has about 10,000 files in the 500 MB range. But only about 1700 in the 200 MB range. In other words, my long-term storage dominates. This was by my design. But both the new and the old restores seem to struggle a bit. I tried tweaking the new method and it failed due to the bug I reported about thrashing, and then failed due to another one.

The old method got me back up and running and finished the restore after several days, but there are so many problems with Duplicati that I must stop using it. Problems start with the counter-intuitive nature of the web interface and words that need help explaining… but there is never any help text for. Obscure settings must be dug up to fix what would be easier for the user with a Q&A and recommended settings, guidance, menus, helps, reminders, suggestions, etc.

I think it’s been 9 years or so for me. It never got polished. Never got easier. And the Windows underwhelming versions of backup, although not very smart, are robust enough and a magnitude of precious time faster to restore when problems arise.

Things that could be parallel aren’t, such as the restore that serially looks at my hard drive and determines what is (and what isn’t) already on the drive and whether it matches the one on backup. Once it obtains a list, it could go into a FIFO buffer and begin the restore in a separate thread. This will simultaneously read and write to that disk, but some drives (SSD and some RAID drives) can perform this without much of a performance hit.

But here I am asking for a feature or bug repair again. That’s not the hallmark of a mature, highly efficient program that has all of these shortcomings. Why haven’t the developers thought about how to more efficiently copy a file? Wasn’t that in programming class 201? And if you do things like that hammering Google Drive or Dropbox with 12 simultaneous file requests, why not do it on the local machine?

TL;dr

I told it to do one (large) subfolder at a time and my restore completed.

Oh, one more bug:

The log results claim that 0 files were restored, no matter if there was 2 or 1,000,000 files. The warnings are always inflated. And the only thing I trust is maybe the number of errors. Here’s an example … I can’t remember how many files got restored, but it was a lot…

And this one from the previous day:

Everything I did for restore was broken in some way. This is just two of the last restores, several more looked the same.

That sounds like there should be plenty of capacity for the restore process. Do you see any bottlenecks on the target drive when its being run with 12 writing processes?

I can see that’s true for your setup. I would argue that it’s not as much that the new restore flow doesn’t support larger volumes, but rather that a re-download of them are more expensive. With the current default cache size setting, I assumed that 100 downloaded volumes would be sufficient for most backups. However, looks like it’s not uncommon that files are spread across more than 100 volumes, leading to the cache issues. Restoring multiple files concurrently further stress this cache boundary, causing them to compete for resources. Changing configuration parameters is always a trade-off: if there wasn’t a trade-off the default would (or at the very least should) be the optimal value.

Was this with the new or the legacy restore flow? If you try the new restore flow with --restore-file-processors set low as well, then it shouldn’t be requesting too many volumes at once (assuming that blocks haven’t been assigned to more than 100 volumes in a round-robin fashion). For the legacy flow, the issue is likely related to the SQLite error mentioned earlier.

The cache size configuration has been addressed in this PR where the default becomes “use as much cache as possible up until there’s only 1 GB left, then perform premature eviction”. It will take some time to make it to a release. Regarding changing the default concurrency parameters, I don’t have a good “new default” yet.

I understand that this information exists, but it needs updating (some of which I cannot do myself) to reflect the characteristics of the new restore flow. Larger volumes still help with backends that favor fewer but larger requests. This can also hold true for the new restore flow, given that there’s enough space on disk to hold the volumes to allow for parallelism. If there’s no premature eviction, then the only downside to larger volumes is that each volume can only be accessed sequentially. Thus smaller volumes allow for smaller resource locks, in turn increasing the parallel capabilities.

I think the primary issue for the new restore flow is the cache issue. As noted, the new default should help with this and we’re still working on improving the cache management further. The legacy restore flow breaks due to the SQLite related issue.

Right, makes sense given your external drive. The new restore flow can be tuned to work with this, as you’ve already found with setting --restore-volume-downloaders=1 ensuring that only one volume is downloaded at a time. However, with the cache issue one volume is fetched multiple times, which I assume is causing the stalling.

I understand your frustration, and I’m sorry to hear that you’re considering stopping using Duplicati. I noticed that you’re using the old web interface: is it still an issue with the new web interface? Regarding information in general, the docs cover a lot of the settings and use cases. Are there any particular topic missing from these that you think would be helpful to add? Contributions are also welcome there.

This is exactly what I was trying to achieve with the new restore flow: parallization of the restore process, which is described in detail in the corresponding blog post. Instead of serially looking at each volume and scattering the containing blocks across the target disk, the new restore flow starts by parallelizing on the file-level: each file is restored in parallel. This also ensures sequential write (assuming the whole file needs to be restored) which benefits any disk.

The flow starts with the FileLister communicating the files to restore to the FileProcessors (of which there are multiple running in parallel). For each file, the FileProcessor checks whether the target file exists (e.g. for picking up on a previous restore), whether it has the correct size and hash, and if not, finds which blocks are missing. Each missing block is then requested from the block cache. If the block is there, it’s returned, if not, it asks the volume cache for the missing block. The volume cache then checks if the volume is already downloaded, and if it is, it sends the volume to the VolumeDecompressors (of which there are multiple running in parallel) which extract the blocks and send them to the block cache. If the volume is not downloaded, it sends the request to the VolumeDownloaders (of which there are multiple running in parallel) which download the volume, send it to the VolumeDecryptors (of which there are multiple running in parallel) which decrypts the volume and sends it back to the volume cache. The volume cache then sends the volume to the VolumeDecompressors which extract the blocks and send them to the block cache. The block cache then sends the blocks back to the FileProcessor which writes them to disk. So the new design has a lot of parallelism, concurrency, pipelining, caching, and asynchrony built in. Each communication step is a FIFO queue allowing for burst requests. The network is tunable (--restore-file-processors=, --restore-volume-downloaders=, --restore-volume-decryptors=, --restore-volume-decompressors=) allowing for more or less parallelism depending on the bottleneck. The cache is tunable (--restore-volume-cache-hint=, --restore-cache-max=) allowing for more or less caching depending on the disk space and RAM available.

Given the complexity and immaturity of the new restore flow, and due to it having lighter resource requirements because of its serial nature, the legacy restore flow has been kept.

There is no need to ridicule me or the other contributors.

I’m unsure of what you mean: both the new and legacy restore flows treat local and remote backends the same, trying to maximize work effort. What’s missing on the local machine?

That’s definitely a bug, and I haven’t seen it before. Is this occuring in the legacy restore flow? Does the log file reflect the same counts? Does the new web interface show the same behavior? Are all of the warnings the same and what are they? When I run a backup and restore with 2.2.0.3 from a local folder to a local folder I see the correct results under the log, both for the old and new web interface.