Resource usage of recreation of local database

I just had an “Unexpected difference in fileset” error happen and googled myself to trying out recreating the local database to try to solve it. I was surprised by how long the recreation took so I tried to see why that is, and found that the resource-usage on my computer was quite odd. This is my backup size:
Source: 179.87 GB
Backup: 171.26 GB / 14 Versions
It is located on a hard-drive inside a NAS and accessed over the network from my computer.

First off I noticed CPU usage never really went above 20% on my 4c/8t CPU, which is fine I guess. I suppose the recreation process is mostly coded to be single-threaded but I guess it would be nice if more cores was used to speed it up. Not really an odd thing though, so that’s fine.
However I noticed that there was a VERY large amount of data written to my disk: https://i.imgur.com/6sLIwK8.png
At the same time I noticed that VERY little RAM was being used by Duplicati: https://i.imgur.com/j6SMQOJ.png
Checking the TBW (Tera-bytes written) on my SSDs confirms the heavy writing: https://i.imgur.com/BiyuqNI.png
The whole recreation process has taken about 8 hours so far, and it’s about 80% complete. Measuring the TBW written to my SSD over 1 hour shows about 300 GB written.
Looking at the actual local database that has been created it comes in at about 2GB: htt ps://i.imgur.com/qxqxXb8.png (had to break this link due to limit of 3 links per post as a new user)

To me these stats seems quite odd. I can’t come up with an explanation why several terabytes of data needs to be written to my SSD (reducing its life-span btw) when it all ends up just being a 2GB file. If the process needs to iterate with writes is it not better to do so in RAM and only write to the disk when complete? That whole 2GB file could easily fit in my RAM, and if not it could be paged to disk by the OS. I feel like doing so would probably speed up the whole process substantially. If I ever need to run a recreation again I will at least make sure to have the local database on a RAM-disk while it runs and just copy over the completed files after, I assume that would work. Or is there something weird with my system and I’m the only ones experiencing this odd resource usage?

1 Like

Hello @MaloW and welcome to the forum!

What Duplicati version are you using? I’m hoping your version is not 2.0.4.18 canary, because this should have been made better, though extended reads of the remote can still happen if data is genuinely missing.

Before the fix, there was a false positive on missing data because of a change where empty files weren’t actually put on the remote. The recreate code hadn’t been adjusted, so kept on looking for an empty file…

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747

Although having logs (got any?) would be better, one can estimate what Recreate is doing using following:

Recreating database logic/understanding/issue/slow

Server logs at Information level or above (e.g. Retry) can show what you’re fetching. I’d guess it’s dblocks.

Channels describes what canary is. If you decide to test it to see if it can Recreate faster, that’d be helpful. Don’t upgrade a production system to canary. It’s difficult to downgrade. Also, if you Recreate onto another system, don’t actually do a backup from there. You never want two systems backing up to one destination.

Typically, the first thing to try for “Unexpected difference in fileset” is to try delete of the version mentioned, perhaps using the Command option of the GUI. Adjust syntax which by default is set up to run backup, so generally you just need to remove the source paths backup would need, and add a --version delete needs.

Having tried to explain the busy disk as getting remote volumes in, I’m bothered to see 0% network activity.

EDIT: Are you a developer? I suspect 2.0.4.5 might be field-patchable using a debugger if you want to hack. :wink:

I’m using version 2.0.4.5 which I thought was the latest version since when I do “Check for updates now” / check on Duplicati it says so :slight_smile: So I guess you can disregard everything I wrote if the later version already improved it! I didn’t know later versions are posted here on the forums.
I am indeed a developer, but I’ll probably just go ahead and install the latest version and hopefully it should be quicker if I ever have to recreate again in the future! If my issue persists after recreate I’ll go ahead and try the delete-method you advised.
Thanks!

You have the latest beta. You do not have the latest canary (which is bleeding-edge but people who are willing to use it for non-critical backups and to help report problems that are found are very helpful for the next beta).

A chronological view of releases, along with the actual download locations, is here in GitHub. Canary can add nice features (sometimes having bugs), known bug fixes, and unknown issues (sometimes rather severe…).

Settings in Duplicati explains how you can adjust the update systen. You can probably either set Canary, and update the beta when it finds one (updates go to a different directory and are detected at startup), or uninstall the beta and install canary as a fresh install. Uninstalling Duplicati should keep your old backup configuration.