Recreating database logic/understanding/issue/slow

Hi Wim,
I don’t think it is very computer specific, I am running on an Intel i7 with 12GB ram. It’s constantly 95% idle.

I am using B2 storage.

I am seeing a lot of messages like this:
* Apr 10, 2019 11:04 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 4 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:03 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 3 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:01 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 2 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 10:59 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 1 of 5 failed with message: Remote prematurely closed connection.

I am having massive Issues with the recreate to.
I had a full harddrive crash shortly after World-Backup-Day, and I thought: Good thing you have had Duplicati running for almost half a year.
I had around 70 Versions off 200GB (both Source and Backup) over said 60 months, custom retention.
I jused the restore function several times, when i accidentally deleted files (even bigger ones), so i had sort of tested it intermittenly and felt comfortable.

So i set up Duplicati again and configured to everything to first restore my files and then continue backing up ontop of the old backup again.
Like many of you i saw that there was no history to my backup and that i needed to recreate the database first.

The connection is a 100 mbit ftps, which is barely the bottleneck, and the database is created on a sata ssd, which is also barely doing anything. The client has a low power quadcore, which is just wandering along at 30 % on all cores.
It has now taken 18 hours to get to 90 %, and now it goes really slow. And all the hits i have found for this issue don’t leave me much hope. Watching the ftps servers log i can tell you it donwloads a dblock every 5-10 minutes, then it writes heavily for 20 seconds at 40mb/s locally and then it computes for said 5-10 minutes.

This is a big issue, without this fixed, or an option to make the datbase part of the backup, Duplicati 2 is just usable against accidental deletion, not for disaster recovery. I’m now hung up between leaving this run for possably weeks, or just restore everything without database and start a new database, ditching all other 60 or so versions before.

1 Like

This seems to be a major bug… I hope a few developers join the conversation soon.

1 Like

I only understand the inner workings of duplicati 1 & 2 in a very very basic way. But:
Couldn’t one use the incremental Method of Duplicati 1 to back up the database, and use the database to do the block based deduplication of Duplicati 2 on all the other files?

I now started the recreation process on a much stronger machine, and it shows the exact same behaviour.
With the exeption that it uses only 10 % of 8 threads. It is like duplicati is giving you the finger.

So that means that this from here on can only be described as a “MAJOR BUG”.
Somewhere in the code there has to be something that makes it deliberately run slow, which seems stupid. If you want to recreate a database of a vital backup, your want your PC to do that, an not have it trundeling at near idle for weeks.

Edit: Since on the stronger machine the rebuild would a least go a little bit faster, i tried to do a restore without database (where Duplicati builds a temporary database) on the “Backup Source / Restore Target”, but than the log shoves it up your arse:
“11. Apr. 2019 23:03: Processing all of the 3471 volumes for blocklists”
Meaning it wont go one bit faster, meaning that i cant access my data for weeks.

If it is a bug, no one cares anymore, if it is by design, …

So in Essence, in case of a database loss, which is the standard for a full hard drive fail, this software is basically useless. You are literally better of just printing your stuff out in binary and typing it back in again…

Does no one of the devs care? This seems to be a long known problem, and all threads or the corresponding github issue including a open bounty just linger…

And to make one thing clear: I am always thankful for “free” Open Source Software. The rare issues where i can actually contribute something i do it happily. May it be bug huntig, providing logs or small financial contributions.
And I know I am throwing a tantrum here.
And not having paid anything, I know I have a “right” for nothing, beggars can’t be choosers.

BUT, everything before the word but is worth nothing:

People put their trust in software like this. Trying to support Open Source will now cost me a lot of money, at least in my situation, because I am sitting here watching a progress bar, not beeing convinced anymore that it will actually work. At least make it clear on the download page:
Not suited for disaster recovery. This is just misleading.

Edit 2:
Just to mention it: Recreating the database also hammers my SSD with roughly 100 GB per hour.
That gets quickly into double digit percentage of the expected life of a consumer SSD for a full recreate.
If yours is a couple years old, you could very well see that fail while trying to recreate a database.
This is just a mess…

Because 90% gets mentioned (sometimes phrased as “last 10%” here), I’ll refer to my post above to say that I think the 70% point is where the dblock fetching runs in three passes, with the final from 90%-100% according to code here. All of this tries to save your data as much as possible, but definitely has its costs. The progress bar is hard to get right. One never knows how much of the last 30% will actually be needed.

What’s specifically wasteful is that, in some cases, I think it tries a fruitless chase, looking for something that will never be found, and if that’s really so, I wish it would recognize that and give up on the searching. This might be hard in the general case, but if the only one is an empty file (or -1 VolumeID), special-case.

I’m not a Duplicati developer, and not an expert in the core design, and my SQL is also not up to the task. The latter two items might be true for most of the small number of active developers. It’s a resource limit.
Awhile ago, the lead developer set out to rewrite recreate and/or repair, but I have no idea where that is…

Repair is downloading dblock files was my stab at analysis, plus a question on how empty files get done.

In current topic, this was another report after more testing, and with pictures to help with the visualization. Some of the people on this thread might look in their own databases to see if they’re seeing such oddities. Preventing them might be ideal, but dealing with them if they happen might better help existing backups…

While I’m offering free advice, I’ll also point to this, where I suggest a –blocksize increase for big backups, intended to reduce the overhead from trying to track lots of 100KB blocks. Maybe default should increase, assuming benchmarks confirm it helps. There’s no performance test team or well-equipped lab though…

Basically, speed is a problem in at least two parts. One is scaling, and tuning measures for its slowdown. Another is a possible bug which sends Duplicati off to download dblocks where it will download all in vain.

There might be other cases that cause that, and some volunteer could move their database aside to see whether they can find a recreate that runs the third pass (maybe a bad sign). The code looks like a log at –log-file-log-level=verbose (possibly due to a bug that doesn’t show lines at information level) should say:

ProcessingAllBlocklistVolumes

(twice on consecutive lines, with the first one possibly listing ALL the possible dblocks, which it will scan)

If anyone can find a test backup not involving an empty file that causes those, that would be good to study.

1 Like

Indeed, the recreate database process seems to be majorly broken. Not only does it take upwards of 10 times as long as it took to backup the original data in the first place, it’s also totally independent of the power of the machine it’s running on.

Considering how easy it is to arrive at a state that requires a database rebuild (just reboot the machine duplicati is running on without proper shutdown), this feature should either be fixed quite soon or measures taken to prevent database corruption in the first place.

EDIT: After yet another database issue today, I’ve had it and will move to a simple rclone script (using the crypt-remote and --backup-folder features) for the time being. I’d love to use Duplicati, but it’s just too slow and brittle in its current beta-stage for a reliable backup solution.

And there is no obivious bottleneck. Drives and Processors are just trundeling along.

It makes the whole thing kind of useless, you are right.

My rebuild got stuck today after 4 days at 92 %.
I’m really throwing a tantrum right now. I trusted this software and now I am beeing punished for it.

1 Like

I can totally understand the frustration, as I have also donated to keep development going. I seriously hope to live the day when I can return to a much improved Duplicati that doesn’t freak out at the smallest problem.

I know this is not an answer to the issue of recreating the database by itself, but did any of you try to use Duplicati.CommandLine.RecoveryTool.exe to make the restore?

I’m just curious because I just started using Duplicati for my backups and now I’m quite unsettled reading about all these issues on recovery …

On my second try I am using the “CommandLine” in the GUI. The actual CLI in CMD always throws some exception or stalls outright.

You should be, in it’s current form the software is not suitable for a real disaster. Should you loose your database the way to your data might be very costly or impossible.
After all it is considered a beta, that should have been a warning for us to not use it in a important environment.

Very sadly I came to the conclusion that Ducati is not robust enough to rely on; and imo nothing can be worse in a backup system.

I am keeping a subset test going, but I would not be sleeping if it was my only solution.

I am close to losing it right now.
The recreation, via the “GUI-CLI” failed the second time this night with the whole porgramm crashing.
I now really regret that i have been tricked into this nightmare…

And the stupid CLI just asks me for my passphrase and does not do anything.
Could someone post please how exactly a command to recreate a databse would have to look like?

I’ve been doing a command line restore with my small test backup, which is just ~1 MB, following the guide at Disaster Recovery - Duplicati 2 User's Manual :

Duplicati.RecoveryTool.exe download <remoteurl> <localfolder> --passphrase=<pwd>
Duplicati.RecoveryTool.exe index <localfolder>
Duplicati.CommandLine.RecoveryTool.exe list <localfolder> 0
Duplicati.CommandLine.RecoveryTool.exe restore <localfolder> 0 --targetpath=<targettempfolder>

But this does not recreate the database …

Did the same with my ~100GB backup - database repair/recreation takes ~5 days (fails 2 times, 3rd time is still on-going). Downloading all files and index creation via the disaster recovery guide toke ~8 hours… would be great, if database can be created from the downloaded files…

Sure, but it’s only an example:

“C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe” repair “file://C:\Duplicati Backups\local test 7 encrypted\” --dbpath=“C:\ProgramData\Duplicati\78777088689071838682.sqlite” --encryption-module=aes --compression-module=zip --dblock-size=50mb --passphrase=password --disable-module=console-password-input

It varies depending on your configuration options (I assume you began in GUI). The trick is specifying the configuration identically enough for command line, for example starting with an Export As Command-line then keeping essential options like encryption, while maybe discarding others such as any exclude rules.

Most things seem safe, so the main change to edit the backup syntax into the repair/recreate syntax is to delete the source path information. Whether or not this will work better than the GUI or GUI/Commandline method is unknown. A lot of the code is common. If you want something further away, the post above by @fichtennadel (and the manual section cited) would be good reading, and for even further away there’s Independent restore program. I think it’s not as nice a restore, but if all else fails it may get your files back.

Restoring files if your Duplicati installation is lost shows a couple of methods that are faster than database recreate in its entirety (with all its versions and files), but you need the full recreate to continue the backup. Possibly even this will hit the suspected bug that causes wasted download of all dblock files, but in theory the creation of the partial temporary database should be faster than the recreation of the full original one…

But that seems not completely right. I tried that, and as soon a I choose certain files, it triggers a “full database rebuild” instead of a partial one.

Finally!

    MainOperation: Repair
    RecreateDatabaseResults:
        MainOperation: Repair
        ParsedResult: Warning
        Version: 2.0.4.5 (2.0.4.5_beta_2018-11-28)
        EndTime: 16/04/2019 11:47:09 PM (1555422429)
        BeginTime: 26/03/2019 11:34:59 PM (1553603699)
        Duration: 21.01:12:09.4122930

Only took 21 days, 1 hour and 12 minutes to recreate my database!! It apparently went quickly from about 95%-ish. So it seems to be the 90% -> 95% range that bogs down.
The last complete backup was 64GB but I had increased the backup set to ~130GB when it crapped out. So I’m not sure which backup size it is indicative of. Whatever the case it is NOT COOL! This is a major bug that needs to be fixed asap!

Any of the Duplicati developers here?

I am wondering why there is zero response from them to the tickets and forum threads on this issue. A backup without working restore is kind of useless…

In the meanwhlie I’ve been considering restic as my backup solution, looks promising so far (no local database…)

Database recreation definitely needs some improvement. It would be helpful if those experiencing issues can provide the following information:

  1. Operating system version
  2. Duplicati version (this is especially important as users can be experiencing similar symptons due to different issues)
  3. Backend storage type
  4. Logs
  5. Any other information about the system while the recreate is (slowly) in progress. For example, amount of free space in the temporary directory, amount of free RAM, CPU usage, networking activity, etc.