Recreating database logic/understanding/issue/slow

josea · March 4, 2019, 4:26pm

Hi,

So I am also having issues with the recreate database feature. I triggered the recreate as part of my assessment of the software. It is taking forever (+5 days and I do not think it is close to finish).

From the log it looks like it is downloading all the dblock files. I hope it is not.

Is there a place where the logic this is applying is documented? I couldn’t find it.

I am trying to make sense of the files in the storage folder. I am guessing that the dlist one is created when a backup session completes, and it contains a list of of dblock/dbindex/files or similar.

The dindex/dblock files seem to be in pairs, though, they do not share the same hex hash in the name (I think they are in pairs because of the datetime is the same). Some images to show what I am talking about.

The recreate process is downloading dblock files that seem to have a dindex file. See images.

That’s what is really puzzling me. Per the other entries I have read, the dblock files should only be downloaded if the dindex file is missing. Well… I have 1938 dblock and 1938 bindex files in my storage. I do not think any of my dindex files are missing. Why is this downloading the dblock files?

Please advise. thanks.

Technical details:
I am running Duplicati - 2.0.4.15_canary_2019-02-06 on Windows 10 (x64). My target storage is Minio S3 (hosted in the remotely). I have been heavily testing it for +1 month now.

I went ahead and created a backup selection of about 500Gb and +100k files. For the initial backups I used a block size of 150mb I changed the block size to 256mb.

My storage folder has 3881 files, dbindex, dblock (in pairs), and some dlist ones (I only have 5).

josea · March 4, 2019, 6:48pm

BTW: I just tested recoverying from another computer that has local access to the remote storage. That computer is able to read the files information (“recreate” when you restore directly from path) in maybe 20 minutes. That’s where you get to choose what to restore.

By monitoring the logs and the operating system it doesn’t look like it is accessing the .dblock files, only the .dlist file.

After that I restored 1 folder just to test. It did a “partial temporary database” recreate. It took maybe 1 hour to complete, it read a bunch of dindex files and seems that only a few dlock ones (I guess where the files actually were).

So, maybe the issue is related to the S3 access? seems to be ignoring the dindex files in the rebuild.

ts678 · March 4, 2019, 8:24pm

Another possibility is that the partial temporary database for a single version doesn’t run into whatever the full database recreate ran into that makes it download dblocks. Possibly if you tried all the versions with a direct restore, you could find one that will download dblock files. You could rule out S3 by testing a similar “direct restore” of the same version over S3 instead of local file access.

I’ve got an incomplete theory on what causes dblock downloads, but to see if it fits your recreate will need you to look at your database with an SQLite viewer (or post a link to a database bug report). For example:

The theory is that a -1 VolumeID sometimes happens with empty files, and causes a wasted full search. Some other times, though, empty files are stored in a remote dblock volume, just as usual (just shorter).

github.com

duplicati/duplicati/blob/0805b5a53c4856f3b27630098805397771f4e190/Duplicati/Library/Main/Database/LocalRecreateDatabase.cs#L501-L502


      
          var missingBlockInfo = 

              @"SELECT ""VolumeID"" FROM ""Block"" WHERE ""VolumeID"" < 0 ";

Above sees the -1 VolumeID, decides information is missing, and just keeps on fetching all the dblocks:

github.com

duplicati/duplicati/blob/0805b5a53c4856f3b27630098805397771f4e190/Duplicati/Library/Main/Operation/RecreateDatabaseHandler.cs#L412-L413


      
          // We have now grabbed as much information as possible,

          // if we are still missing data, we must now fetch block files

ts678 · March 5, 2019, 12:59am

How the backup process works has some documentation, including for filelist.json that’s in the dlist file.

Unfortunately I don’t think documentation gets as far down as details of recreate. If you can read source (which also has a few comments), the links I point to might help. It looks to me like it gathers dlists, then dindexes, then (as said earlier) goes to fetch dblock files if data is still missing. Testing with new backup having an empty source file caused dlist to have this entry in filelist.json – but not in the dindex or dblock:

{“type”:“File”,“path”:“C:\emptytests\length0.txt”,“hash”:“47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=”,“size”:0,“time”:“20190305T000452Z”,“metahash”:“TZm2AFXRe8Y4ja/tCPQ7NU/QSv9mqhpnUcht89kWldU=”,“metasize”:137}

I tested a new backup of a 1 byte file, then added a 0 byte file and did another backup. I used a Database Delete, then used Commandline to run a repair, changing the Commandline arguments to –console-log-level=information and (on next line) –version=(1 for first backup or 0 for second). The options can also be added at screen bottom Add advanced option. Remember to delete database as you change versions. You’ll see (if you get the same results) that the version with only the 1-byte file ends nicely after the dindex files are read, but the second backup (known as 0 because 0 is newest) continues to read all dblock files.

The recreated database also has the VolumeID 0f -1, so possibly that’s at least one way to get that oddity. Getting deep in terminology, the 0 length file ended at Blockset table, without BlocksetEntry or Block table. Earlier I had described one that got to the Block table with an empty block. There are 3 ways to show this.

If you prefer to test using a full database Recreate (instead of specifying --version), live logging as you did originally should be fine. You could also examine your source area to see if you even have any empty files, because there might be some other ways (not involving empty files) to get all the dblock files downloaded.

Usually though, a test case is better than none, so I hope this aids development. I might soon file an issue, however I’d certainly encourage you to do it if you can replicate something similar to the above test results.

josea · March 5, 2019, 4:22am

Thanks. I’ll check it out. I can read code, but I do not know if I have the time/energy to go into full debug mode.

Honestly, I cannot imagine why the recreate is such a brute force algorithm (or at least it seems to be). I would think there would be a way to do a “light recreate” that just takes the dindex/list files and go from there. Downloading the whole backup to restart the backup client is unfeasible, unless you’re backing up just a few files.

That’s another reason I am looking for documentation. I am shocked that there are no options for the recreate (ie: it is “let me download the 500gb of historical backups OR stop, delete all your settings and backups and restart from 0”).

Needing to download the whole thing to continue in case of issues is a major drawback.

ts678 · March 5, 2019, 8:21pm

Sounds like this might be a request to be able to opt out of the extreme measures, at risk of maybe some less-than-max-results but in a tolerable time. There is probably a rewrite of repair/recreate underway now, so who knows what’s due to come? From another point of view, a repair or other method can sometimes avoid the need for a full recreate, and a partial recreate is a lighter option if it comes down to a recreate…

That’s not what it always does. There are three passes (presumably increasingly painful) and stop-points:

github.com

duplicati/duplicati/blob/0805b5a53c4856f3b27630098805397771f4e190/Duplicati/Library/Main/Database/LocalRecreateDatabase.cs#L526-L527


      
          //On anything but the first pass, we check if we are done

          var r = cmd.ExecuteScalarInt64(countMissingInformation, 0);

On top of that, there are other repair tools. Unfortunately, repair has its specific task, as do the others such as purge, list-broken-files, purge-broken-files, and repair in it’s non-recreate behavior. Lack of detailed documentation on what to use does make things worse, but having any manual at all is only about a year old, and possibly one of the many things that could improve with more volunteer help.

What about a manual? explains original author point that troubleshooting is covered by the forum, and I sometimes agree (until computers are better than people at self-healing their ails) but sometimes don’t. There’s a definite volume problem from a large user base having even very occasional issues per user. There’s also a scaling problem (as with code) of not being able to replicate expertise as fast as desired.

There are options for “recreate” in a loose sense, but the other one isn’t called “recreate”, and there’s no single button for restart from 0 (though I’d have thought you’d want the old settings, different destination).

The decision on which way to go now uses experiment and human discussion, but sadly often only after someone finds that a full database recreate (in the tight sense of reading backend) is taking too long. It’s sometimes asked how long it will take, and answers are hard because they vary too much on situations, and simplistic formulas such as download speed and total destination file size don’t help because you’re not supposed to have to download everything, and one can’t know what’s missing without some looking. Still, to your point, it might someday be possible to do estimates, based on factors, for informed options. There are currently plenty of other hot issues to handle, so I can’t forecast when this idea might happen. Possibly you can write a request in forum Features or GitHub issues so that it’s at less risk of being lost.

josea · March 6, 2019, 2:00pm

Hi ts678,

Just to be clear. I am testing the solution - so I am trying to emulate what I would consider a proper scenario where I need to restore files.

So my hypothetical test is: the computer had a total HDD failure. I bought a new HDD and need to restore some key files, then everything and then continue as normal. The computer had the failure while running a backup job. I do not consider any of this a long stretch of a very possible situation.

To simulate this, I created a big (but real) backup job for my computer. It is about 500gb. I let it complete a few times.

Then I started the job and in the middle of it I just cancelled. Went ahead and deleted the .sqlite database and asked to recreate.

As mentioned before: in my remote storage everything looks fine (dblock and dindex files are in pairs, there is one dlist per completed job).

That’s where the behavior is very hard to understand (what duplicati is doing). I do not know the internal format of the files, but I assume that the dindex files say what the dblock files have. So, it is hard to understand why any dblock file is needed at all. As mentioned before, that shouldn’t be forced and the system should allow to rebuild based on the dindex files only - forcing in this case just makes it unfeasible (I cannot wait an undertermined amount of time with an uninterrupment, high unlimited bandwith connection).

I killed my test. It ran for a week and it continued to download (old) dblock files. I just restored the .sqlite database manually (which wouldn’t be possible in my hypothetical scenario).

Thanks.

BobW · March 6, 2019, 7:11pm

I have also experienced problems with Recreate Database. I was using Duplicati to backup to a USB connected WD 4 Tb drive. After several hours the computer locked up forcing me to do a hard reboot. When I restarted the backup, Duplicati warned me about problems so I did a delete and repair database.

Duplicati ran for 10 days recreating the database and did not look close to finishing so I aborted it. The backup source is 357 Gb. I can do a total backup from start in less than 3 days.

Recreating the database should not take longer than a full backup.

I am running 2.0.4.5_beta_2018-11-28 on Windows 7. I am backing up to a WD drive connected through USB3.

josea · March 6, 2019, 7:22pm

Wow. That is even worse considering you are using local storage.

jsword · April 8, 2019, 12:59pm

My “recreate database” has been running for probably 2 weeks and a week of that has been from 90% to 93%. I’m estimating it will probably take another month .
My source data is around 300G and I have a terrible upload speed. I have not been able to get a full backup of a 60GB data set so far. I was pretty keen to use Duplicati, but I am really starting to doubt my decision

Wim_Jansen · April 8, 2019, 8:15pm

I did a recreate that took maybe an hour or a bit more. A lot depends on the machine you are running this. I started on a ds216j which was very slow and would take days or weeks. Stopped it and moved it to my laptop. That did the job much faster. Now the backup job is running on the ds216j. Over 300 GB in size, volume size 1gb.

jsword · April 10, 2019, 1:08pm

Hi Wim,
I don’t think it is very computer specific, I am running on an Intel i7 with 12GB ram. It’s constantly 95% idle.

I am using B2 storage.

I am seeing a lot of messages like this:
* Apr 10, 2019 11:04 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 4 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:03 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:02 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 3 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 11:01 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 11:01 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 2 of 5 failed with message: Remote prematurely closed connection.

* Apr 10, 2019 10:59 PM: Backend event: Get - Started: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Backend event: Get - Retrying: duplicati-XXXX46.dblock.zip.aes (49.99 MB)

* Apr 10, 2019 10:59 PM: Operation Get with file duplicati-XXXX46.dblock.zip.aes attempt 1 of 5 failed with message: Remote prematurely closed connection.

f4253a38b0 · April 10, 2019, 3:37pm

I am having massive Issues with the recreate to.
I had a full harddrive crash shortly after World-Backup-Day, and I thought: Good thing you have had Duplicati running for almost half a year.
I had around 70 Versions off 200GB (both Source and Backup) over said 60 months, custom retention.
I jused the restore function several times, when i accidentally deleted files (even bigger ones), so i had sort of tested it intermittenly and felt comfortable.

So i set up Duplicati again and configured to everything to first restore my files and then continue backing up ontop of the old backup again.
Like many of you i saw that there was no history to my backup and that i needed to recreate the database first.

The connection is a 100 mbit ftps, which is barely the bottleneck, and the database is created on a sata ssd, which is also barely doing anything. The client has a low power quadcore, which is just wandering along at 30 % on all cores.
It has now taken 18 hours to get to 90 %, and now it goes really slow. And all the hits i have found for this issue don’t leave me much hope. Watching the ftps servers log i can tell you it donwloads a dblock every 5-10 minutes, then it writes heavily for 20 seconds at 40mb/s locally and then it computes for said 5-10 minutes.

This is a big issue, without this fixed, or an option to make the datbase part of the backup, Duplicati 2 is just usable against accidental deletion, not for disaster recovery. I’m now hung up between leaving this run for possably weeks, or just restore everything without database and start a new database, ditching all other 60 or so versions before.

jsword · April 11, 2019, 8:49am

This seems to be a major bug… I hope a few developers join the conversation soon.

f4253a38b0 · April 11, 2019, 9:45am

I only understand the inner workings of duplicati 1 & 2 in a very very basic way. But:
Couldn’t one use the incremental Method of Duplicati 1 to back up the database, and use the database to do the block based deduplication of Duplicati 2 on all the other files?

f4253a38b0 · April 11, 2019, 11:34am

I now started the recreation process on a much stronger machine, and it shows the exact same behaviour.
With the exeption that it uses only 10 % of 8 threads. It is like duplicati is giving you the finger.

So that means that this from here on can only be described as a “MAJOR BUG”.
Somewhere in the code there has to be something that makes it deliberately run slow, which seems stupid. If you want to recreate a database of a vital backup, your want your PC to do that, an not have it trundeling at near idle for weeks.

Edit: Since on the stronger machine the rebuild would a least go a little bit faster, i tried to do a restore without database (where Duplicati builds a temporary database) on the “Backup Source / Restore Target”, but than the log shoves it up your arse:
“11. Apr. 2019 23:03: Processing all of the 3471 volumes for blocklists”
Meaning it wont go one bit faster, meaning that i cant access my data for weeks.

If it is a bug, no one cares anymore, if it is by design, …

So in Essence, in case of a database loss, which is the standard for a full hard drive fail, this software is basically useless. You are literally better of just printing your stuff out in binary and typing it back in again…

Does no one of the devs care? This seems to be a long known problem, and all threads or the corresponding github issue including a open bounty just linger…

And to make one thing clear: I am always thankful for “free” Open Source Software. The rare issues where i can actually contribute something i do it happily. May it be bug huntig, providing logs or small financial contributions.
And I know I am throwing a tantrum here.
And not having paid anything, I know I have a “right” for nothing, beggars can’t be choosers.

BUT, everything before the word but is worth nothing:

People put their trust in software like this. Trying to support Open Source will now cost me a lot of money, at least in my situation, because I am sitting here watching a progress bar, not beeing convinced anymore that it will actually work. At least make it clear on the download page:
Not suited for disaster recovery. This is just misleading.

Edit 2:
Just to mention it: Recreating the database also hammers my SSD with roughly 100 GB per hour.
That gets quickly into double digit percentage of the expected life of a consumer SSD for a full recreate.
If yours is a couple years old, you could very well see that fail while trying to recreate a database.
This is just a mess…

ts678 · April 13, 2019, 2:47am

Because 90% gets mentioned (sometimes phrased as “last 10%” here), I’ll refer to my post above to say that I think the 70% point is where the dblock fetching runs in three passes, with the final from 90%-100% according to code here. All of this tries to save your data as much as possible, but definitely has its costs. The progress bar is hard to get right. One never knows how much of the last 30% will actually be needed.

What’s specifically wasteful is that, in some cases, I think it tries a fruitless chase, looking for something that will never be found, and if that’s really so, I wish it would recognize that and give up on the searching. This might be hard in the general case, but if the only one is an empty file (or -1 VolumeID), special-case.

I’m not a Duplicati developer, and not an expert in the core design, and my SQL is also not up to the task. The latter two items might be true for most of the small number of active developers. It’s a resource limit.
Awhile ago, the lead developer set out to rewrite recreate and/or repair, but I have no idea where that is…

Repair is downloading dblock files was my stab at analysis, plus a question on how empty files get done.

In current topic, this was another report after more testing, and with pictures to help with the visualization. Some of the people on this thread might look in their own databases to see if they’re seeing such oddities. Preventing them might be ideal, but dealing with them if they happen might better help existing backups…

While I’m offering free advice, I’ll also point to this, where I suggest a –blocksize increase for big backups, intended to reduce the overhead from trying to track lots of 100KB blocks. Maybe default should increase, assuming benchmarks confirm it helps. There’s no performance test team or well-equipped lab though…

Basically, speed is a problem in at least two parts. One is scaling, and tuning measures for its slowdown. Another is a possible bug which sends Duplicati off to download dblocks where it will download all in vain.

There might be other cases that cause that, and some volunteer could move their database aside to see whether they can find a recreate that runs the third pass (maybe a bad sign). The code looks like a log at –log-file-log-level=verbose (possibly due to a bug that doesn’t show lines at information level) should say:

ProcessingAllBlocklistVolumes

(twice on consecutive lines, with the first one possibly listing ALL the possible dblocks, which it will scan)

If anyone can find a test backup not involving an empty file that causes those, that would be good to study.

drdebian · April 14, 2019, 3:14am

Indeed, the recreate database process seems to be majorly broken. Not only does it take upwards of 10 times as long as it took to backup the original data in the first place, it’s also totally independent of the power of the machine it’s running on.

Considering how easy it is to arrive at a state that requires a database rebuild (just reboot the machine duplicati is running on without proper shutdown), this feature should either be fixed quite soon or measures taken to prevent database corruption in the first place.

EDIT: After yet another database issue today, I’ve had it and will move to a simple rclone script (using the crypt-remote and --backup-folder features) for the time being. I’d love to use Duplicati, but it’s just too slow and brittle in its current beta-stage for a reliable backup solution.

f4253a38b0 · April 14, 2019, 10:38am

And there is no obivious bottleneck. Drives and Processors are just trundeling along.

It makes the whole thing kind of useless, you are right.

My rebuild got stuck today after 4 days at 92 %.
I’m really throwing a tantrum right now. I trusted this software and now I am beeing punished for it.

drdebian · April 14, 2019, 10:46am

I can totally understand the frustration, as I have also donated to keep development going. I seriously hope to live the day when I can return to a much improved Duplicati that doesn’t freak out at the smallest problem.