Recreating database logic/understanding/issue/slow

Thanks. I’ll check it out. I can read code, but I do not know if I have the time/energy to go into full debug mode.

Honestly, I cannot imagine why the recreate is such a brute force algorithm (or at least it seems to be). I would think there would be a way to do a “light recreate” that just takes the dindex/list files and go from there. Downloading the whole backup to restart the backup client is unfeasible, unless you’re backing up just a few files.

That’s another reason I am looking for documentation. I am shocked that there are no options for the recreate (ie: it is “let me download the 500gb of historical backups OR stop, delete all your settings and backups and restart from 0”).

Needing to download the whole thing to continue in case of issues is a major drawback.

Sounds like this might be a request to be able to opt out of the extreme measures, at risk of maybe some less-than-max-results but in a tolerable time. There is probably a rewrite of repair/recreate underway now, so who knows what’s due to come? From another point of view, a repair or other method can sometimes avoid the need for a full recreate, and a partial recreate is a lighter option if it comes down to a recreate…

That’s not what it always does. There are three passes (presumably increasingly painful) and stop-points:

On top of that, there are other repair tools. Unfortunately, repair has its specific task, as do the others such as purge, list-broken-files, purge-broken-files, and repair in it’s non-recreate behavior. Lack of detailed documentation on what to use does make things worse, but having any manual at all is only about a year old, and possibly one of the many things that could improve with more volunteer help.

What about a manual? explains original author point that troubleshooting is covered by the forum, and I sometimes agree (until computers are better than people at self-healing their ails) but sometimes don’t. There’s a definite volume problem from a large user base having even very occasional issues per user. There’s also a scaling problem (as with code) of not being able to replicate expertise as fast as desired.

There are options for “recreate” in a loose sense, but the other one isn’t called “recreate”, and there’s no single button for restart from 0 (though I’d have thought you’d want the old settings, different destination).

The decision on which way to go now uses experiment and human discussion, but sadly often only after someone finds that a full database recreate (in the tight sense of reading backend) is taking too long. It’s sometimes asked how long it will take, and answers are hard because they vary too much on situations, and simplistic formulas such as download speed and total destination file size don’t help because you’re not supposed to have to download everything, and one can’t know what’s missing without some looking. Still, to your point, it might someday be possible to do estimates, based on factors, for informed options. There are currently plenty of other hot issues to handle, so I can’t forecast when this idea might happen. Possibly you can write a request in forum Features or GitHub issues so that it’s at less risk of being lost.

Hi ts678,

Just to be clear. I am testing the solution - so I am trying to emulate what I would consider a proper scenario where I need to restore files.

So my hypothetical test is: the computer had a total HDD failure. I bought a new HDD and need to restore some key files, then everything and then continue as normal. The computer had the failure while running a backup job. I do not consider any of this a long stretch of a very possible situation.

To simulate this, I created a big (but real) backup job for my computer. It is about 500gb. I let it complete a few times.

Then I started the job and in the middle of it I just cancelled. Went ahead and deleted the .sqlite database and asked to recreate.

As mentioned before: in my remote storage everything looks fine (dblock and dindex files are in pairs, there is one dlist per completed job).

That’s where the behavior is very hard to understand (what duplicati is doing). I do not know the internal format of the files, but I assume that the dindex files say what the dblock files have. So, it is hard to understand why any dblock file is needed at all. As mentioned before, that shouldn’t be forced and the system should allow to rebuild based on the dindex files only - forcing in this case just makes it unfeasible (I cannot wait an undertermined amount of time with an uninterrupment, high unlimited bandwith connection).

I killed my test. It ran for a week and it continued to download (old) dblock files. I just restored the .sqlite database manually (which wouldn’t be possible in my hypothetical scenario).

Thanks.

I have also experienced problems with Recreate Database. I was using Duplicati to backup to a USB connected WD 4 Tb drive. After several hours the computer locked up forcing me to do a hard reboot. When I restarted the backup, Duplicati warned me about problems so I did a delete and repair database.

Duplicati ran for 10 days recreating the database and did not look close to finishing so I aborted it. The backup source is 357 Gb. I can do a total backup from start in less than 3 days.

Recreating the database should not take longer than a full backup.

I am running 2.0.4.5_beta_2018-11-28 on Windows 7. I am backing up to a WD drive connected through USB3.

2 Likes

Wow. That is even worse considering you are using local storage.

My “recreate database” has been running for probably 2 weeks and a week of that has been from 90% to 93%. I’m estimating it will probably take another month :frowning: .
My source data is around 300G and I have a terrible upload speed. I have not been able to get a full backup of a 60GB data set so far. I was pretty keen to use Duplicati, but I am really starting to doubt my decision :cry:

1 Like

I did a recreate that took maybe an hour or a bit more. A lot depends on the machine you are running this. I started on a ds216j which was very slow and would take days or weeks. Stopped it and moved it to my laptop. That did the job much faster. Now the backup job is running on the ds216j. Over 300 GB in size, volume size 1gb.

Hi Wim,
I don’t think it is very computer specific, I am running on an Intel i7 with 12GB ram. It’s constantly 95% idle.

I am using B2 storage.

I am seeing a lot of messages like this:
* Apr 10, 2019 11:04 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 4 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:03 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 3 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:01 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 2 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 10:59 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 1 of 5 failed with message: Remote prematurely closed connection.

I am having massive Issues with the recreate to.
I had a full harddrive crash shortly after World-Backup-Day, and I thought: Good thing you have had Duplicati running for almost half a year.
I had around 70 Versions off 200GB (both Source and Backup) over said 60 months, custom retention.
I jused the restore function several times, when i accidentally deleted files (even bigger ones), so i had sort of tested it intermittenly and felt comfortable.

So i set up Duplicati again and configured to everything to first restore my files and then continue backing up ontop of the old backup again.
Like many of you i saw that there was no history to my backup and that i needed to recreate the database first.

The connection is a 100 mbit ftps, which is barely the bottleneck, and the database is created on a sata ssd, which is also barely doing anything. The client has a low power quadcore, which is just wandering along at 30 % on all cores.
It has now taken 18 hours to get to 90 %, and now it goes really slow. And all the hits i have found for this issue don’t leave me much hope. Watching the ftps servers log i can tell you it donwloads a dblock every 5-10 minutes, then it writes heavily for 20 seconds at 40mb/s locally and then it computes for said 5-10 minutes.

This is a big issue, without this fixed, or an option to make the datbase part of the backup, Duplicati 2 is just usable against accidental deletion, not for disaster recovery. I’m now hung up between leaving this run for possably weeks, or just restore everything without database and start a new database, ditching all other 60 or so versions before.

1 Like

This seems to be a major bug… I hope a few developers join the conversation soon.

1 Like

I only understand the inner workings of duplicati 1 & 2 in a very very basic way. But:
Couldn’t one use the incremental Method of Duplicati 1 to back up the database, and use the database to do the block based deduplication of Duplicati 2 on all the other files?

I now started the recreation process on a much stronger machine, and it shows the exact same behaviour.
With the exeption that it uses only 10 % of 8 threads. It is like duplicati is giving you the finger.

So that means that this from here on can only be described as a “MAJOR BUG”.
Somewhere in the code there has to be something that makes it deliberately run slow, which seems stupid. If you want to recreate a database of a vital backup, your want your PC to do that, an not have it trundeling at near idle for weeks.

Edit: Since on the stronger machine the rebuild would a least go a little bit faster, i tried to do a restore without database (where Duplicati builds a temporary database) on the “Backup Source / Restore Target”, but than the log shoves it up your arse:
“11. Apr. 2019 23:03: Processing all of the 3471 volumes for blocklists”
Meaning it wont go one bit faster, meaning that i cant access my data for weeks.

If it is a bug, no one cares anymore, if it is by design, …

So in Essence, in case of a database loss, which is the standard for a full hard drive fail, this software is basically useless. You are literally better of just printing your stuff out in binary and typing it back in again…

Does no one of the devs care? This seems to be a long known problem, and all threads or the corresponding github issue including a open bounty just linger…

And to make one thing clear: I am always thankful for “free” Open Source Software. The rare issues where i can actually contribute something i do it happily. May it be bug huntig, providing logs or small financial contributions.
And I know I am throwing a tantrum here.
And not having paid anything, I know I have a “right” for nothing, beggars can’t be choosers.

BUT, everything before the word but is worth nothing:

People put their trust in software like this. Trying to support Open Source will now cost me a lot of money, at least in my situation, because I am sitting here watching a progress bar, not beeing convinced anymore that it will actually work. At least make it clear on the download page:
Not suited for disaster recovery. This is just misleading.

Edit 2:
Just to mention it: Recreating the database also hammers my SSD with roughly 100 GB per hour.
That gets quickly into double digit percentage of the expected life of a consumer SSD for a full recreate.
If yours is a couple years old, you could very well see that fail while trying to recreate a database.
This is just a mess…

Because 90% gets mentioned (sometimes phrased as “last 10%” here), I’ll refer to my post above to say that I think the 70% point is where the dblock fetching runs in three passes, with the final from 90%-100% according to code here. All of this tries to save your data as much as possible, but definitely has its costs. The progress bar is hard to get right. One never knows how much of the last 30% will actually be needed.

What’s specifically wasteful is that, in some cases, I think it tries a fruitless chase, looking for something that will never be found, and if that’s really so, I wish it would recognize that and give up on the searching. This might be hard in the general case, but if the only one is an empty file (or -1 VolumeID), special-case.

I’m not a Duplicati developer, and not an expert in the core design, and my SQL is also not up to the task. The latter two items might be true for most of the small number of active developers. It’s a resource limit.
Awhile ago, the lead developer set out to rewrite recreate and/or repair, but I have no idea where that is…

Repair is downloading dblock files was my stab at analysis, plus a question on how empty files get done.

In current topic, this was another report after more testing, and with pictures to help with the visualization. Some of the people on this thread might look in their own databases to see if they’re seeing such oddities. Preventing them might be ideal, but dealing with them if they happen might better help existing backups…

While I’m offering free advice, I’ll also point to this, where I suggest a –blocksize increase for big backups, intended to reduce the overhead from trying to track lots of 100KB blocks. Maybe default should increase, assuming benchmarks confirm it helps. There’s no performance test team or well-equipped lab though…

Basically, speed is a problem in at least two parts. One is scaling, and tuning measures for its slowdown. Another is a possible bug which sends Duplicati off to download dblocks where it will download all in vain.

There might be other cases that cause that, and some volunteer could move their database aside to see whether they can find a recreate that runs the third pass (maybe a bad sign). The code looks like a log at –log-file-log-level=verbose (possibly due to a bug that doesn’t show lines at information level) should say:

ProcessingAllBlocklistVolumes

(twice on consecutive lines, with the first one possibly listing ALL the possible dblocks, which it will scan)

If anyone can find a test backup not involving an empty file that causes those, that would be good to study.

1 Like

Indeed, the recreate database process seems to be majorly broken. Not only does it take upwards of 10 times as long as it took to backup the original data in the first place, it’s also totally independent of the power of the machine it’s running on.

Considering how easy it is to arrive at a state that requires a database rebuild (just reboot the machine duplicati is running on without proper shutdown), this feature should either be fixed quite soon or measures taken to prevent database corruption in the first place.

EDIT: After yet another database issue today, I’ve had it and will move to a simple rclone script (using the crypt-remote and --backup-folder features) for the time being. I’d love to use Duplicati, but it’s just too slow and brittle in its current beta-stage for a reliable backup solution.

And there is no obivious bottleneck. Drives and Processors are just trundeling along.

It makes the whole thing kind of useless, you are right.

My rebuild got stuck today after 4 days at 92 %.
I’m really throwing a tantrum right now. I trusted this software and now I am beeing punished for it.

1 Like

I can totally understand the frustration, as I have also donated to keep development going. I seriously hope to live the day when I can return to a much improved Duplicati that doesn’t freak out at the smallest problem.

I know this is not an answer to the issue of recreating the database by itself, but did any of you try to use Duplicati.CommandLine.RecoveryTool.exe to make the restore?

I’m just curious because I just started using Duplicati for my backups and now I’m quite unsettled reading about all these issues on recovery …

On my second try I am using the “CommandLine” in the GUI. The actual CLI in CMD always throws some exception or stalls outright.

You should be, in it’s current form the software is not suitable for a real disaster. Should you loose your database the way to your data might be very costly or impossible.
After all it is considered a beta, that should have been a warning for us to not use it in a important environment.

Very sadly I came to the conclusion that Ducati is not robust enough to rely on; and imo nothing can be worse in a backup system.

I am keeping a subset test going, but I would not be sleeping if it was my only solution.

I am close to losing it right now.
The recreation, via the “GUI-CLI” failed the second time this night with the whole porgramm crashing.
I now really regret that i have been tricked into this nightmare…

And the stupid CLI just asks me for my passphrase and does not do anything.
Could someone post please how exactly a command to recreate a databse would have to look like?