Recreating database logic/understanding/issue/slow

fichtennadel · April 15, 2019, 6:21pm

I know this is not an answer to the issue of recreating the database by itself, but did any of you try to use Duplicati.CommandLine.RecoveryTool.exe to make the restore?

I’m just curious because I just started using Duplicati for my backups and now I’m quite unsettled reading about all these issues on recovery …

f4253a38b0 · April 15, 2019, 9:14pm

On my second try I am using the “CommandLine” in the GUI. The actual CLI in CMD always throws some exception or stalls outright.

You should be, in it’s current form the software is not suitable for a real disaster. Should you loose your database the way to your data might be very costly or impossible.
After all it is considered a beta, that should have been a warning for us to not use it in a important environment.

henryg · April 16, 2019, 5:59am

Very sadly I came to the conclusion that Ducati is not robust enough to rely on; and imo nothing can be worse in a backup system.

I am keeping a subset test going, but I would not be sleeping if it was my only solution.

f4253a38b0 · April 16, 2019, 9:18am

I am close to losing it right now.
The recreation, via the “GUI-CLI” failed the second time this night with the whole porgramm crashing.
I now really regret that i have been tricked into this nightmare…

And the stupid CLI just asks me for my passphrase and does not do anything.
Could someone post please how exactly a command to recreate a databse would have to look like?

fichtennadel · April 16, 2019, 10:42am

I’ve been doing a command line restore with my small test backup, which is just ~1 MB, following the guide at Disaster Recovery - Duplicati 2 User's Manual :

Duplicati.RecoveryTool.exe download <remoteurl> <localfolder> --passphrase=<pwd>
Duplicati.RecoveryTool.exe index <localfolder>
Duplicati.CommandLine.RecoveryTool.exe list <localfolder> 0
Duplicati.CommandLine.RecoveryTool.exe restore <localfolder> 0 --targetpath=<targettempfolder>

But this does not recreate the database …

heinrich.peters · April 17, 2019, 6:11pm

Did the same with my ~100GB backup - database repair/recreation takes ~5 days (fails 2 times, 3rd time is still on-going). Downloading all files and index creation via the disaster recovery guide toke ~8 hours… would be great, if database can be created from the downloaded files…

ts678 · April 17, 2019, 7:06pm

Sure, but it’s only an example:

“C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe” repair “file://C:\Duplicati Backups\local test 7 encrypted\” --dbpath=“C:\ProgramData\Duplicati\78777088689071838682.sqlite” --encryption-module=aes --compression-module=zip --dblock-size=50mb --passphrase=password --disable-module=console-password-input

It varies depending on your configuration options (I assume you began in GUI). The trick is specifying the configuration identically enough for command line, for example starting with an Export As Command-line then keeping essential options like encryption, while maybe discarding others such as any exclude rules.

Most things seem safe, so the main change to edit the backup syntax into the repair/recreate syntax is to delete the source path information. Whether or not this will work better than the GUI or GUI/Commandline method is unknown. A lot of the code is common. If you want something further away, the post above by @fichtennadel (and the manual section cited) would be good reading, and for even further away there’s Independent restore program. I think it’s not as nice a restore, but if all else fails it may get your files back.

Restoring files if your Duplicati installation is lost shows a couple of methods that are faster than database recreate in its entirety (with all its versions and files), but you need the full recreate to continue the backup. Possibly even this will hit the suspected bug that causes wasted download of all dblock files, but in theory the creation of the partial temporary database should be faster than the recreation of the full original one…

f4253a38b0 · April 17, 2019, 9:45pm

But that seems not completely right. I tried that, and as soon a I choose certain files, it triggers a “full database rebuild” instead of a partial one.

jsword · April 18, 2019, 1:33pm

Finally!

    MainOperation: Repair
    RecreateDatabaseResults:
        MainOperation: Repair
        ParsedResult: Warning
        Version: 2.0.4.5 (2.0.4.5_beta_2018-11-28)
        EndTime: 16/04/2019 11:47:09 PM (1555422429)
        BeginTime: 26/03/2019 11:34:59 PM (1553603699)
        Duration: 21.01:12:09.4122930

Only took 21 days, 1 hour and 12 minutes to recreate my database!! It apparently went quickly from about 95%-ish. So it seems to be the 90% -> 95% range that bogs down.
The last complete backup was 64GB but I had increased the backup set to ~130GB when it crapped out. So I’m not sure which backup size it is indicative of. Whatever the case it is NOT COOL! This is a major bug that needs to be fixed asap!

fichtennadel · April 20, 2019, 8:29am

Any of the Duplicati developers here?

I am wondering why there is zero response from them to the tickets and forum threads on this issue. A backup without working restore is kind of useless…

In the meanwhlie I’ve been considering restic as my backup solution, looks promising so far (no local database…)

warwickmm · April 21, 2019, 5:31pm

Database recreation definitely needs some improvement. It would be helpful if those experiencing issues can provide the following information:

Operating system version
Duplicati version (this is especially important as users can be experiencing similar symptons due to different issues)
Backend storage type
Logs
Any other information about the system while the recreate is (slowly) in progress. For example, amount of free space in the temporary directory, amount of free RAM, CPU usage, networking activity, etc.

f4253a38b0 · April 21, 2019, 6:14pm

I’m propably the most annoyed/annoying here, but I’m willing to comply and answer to everything warwickmm has asked for, to make the situation better for everything:

I am currently trying to recreate the same database via two PC’s allthoug I abborted one because it was definetly slower than the other one.
I will skip Question 4, but will deliver immediately when somebody tells me what type of log to submit explicitly, becuase duplicati seems to have approximately 10 different logs with 7 different verbose levels each.

System One:
i7 3770, 16 GB of RAM Dual Channel DDR 3 1333 Mhz, 850Evo Sata SSD for System and Temp Drive

OS: Win 10 1809
Version : 2.0.4.5
Backend Storage: Local via Sata, 3 TB Harddrive via SATA 2
5-10% CPU Usage, uses roughly 40 MB of RAM while 8 GB are still free, no Network used obviously, the Temp Folder lies, as mentioned on a mostly free Sata SSD. As soon as it reaches 90%, where it continues to grab all of the 50MB dblocks, it processes one each 5-10 Minutes approximately.

System 2:
Intel Pentium N3530 QuadCore, 8GB DDR3 SingleChannel 1300 MHz, MX500 Sata SSD for Temp and System Drive

Win 10 1809
Version 2.0.4.5
FTPS-Server, 100MBit Connection
5.CPU usage exactly 30% for the most part with some spikes. RAM again 40-50MB used by Duplicati while roughly 50% (4GB) free, TempDrive mostly free as well. On this System it takes 15 Minutes to process a dblock, the transfer of the 50 MB dblocks happens in approx. 5-6secs.

TempDrive Usage (Time wise) is always something, but on both Machines it never reaches 100% in TaskManager. If i monitor the usage, all other application closed, it does approx. a 100GB per hour though.

Please feel free to ask for more information. The recreate on the i7 System is still running and is in his 90% Phase, so i could still do some logs of this recreate.

warwickmm · April 21, 2019, 7:38pm

The server logs at the profiling level would be most useful.

f4253a38b0 · April 22, 2019, 7:10pm

I am working on it.
How many lines of it would you need, i would guess from one dblock to another, as these operations take most of the time, and does it contain any personal information i would have to censor?

warwickmm · April 23, 2019, 1:52am

The profiling logs are huge, so it doesn’t need to be the entire log. I think having enough lines that capture a reasonable window of time (10 or 20 minutes?) during the slow phase will be helpful to the developers. If in doubt, the more logs the better.

f4253a38b0 · April 23, 2019, 9:57am

dup2.zip (84.4 KB)

Here is a 15 Minute Log.
In hinesight sadly i did not turn on logging to file beforehand, so this is copied out of the live-log.
Its not real zip file, i just renamed it to circumvent the uploadfilter and the characterlimit.
Just rename it *.txt or *.log.

ts678 · April 23, 2019, 2:58pm

Sample log output shows it downloading a dblock, then doing a sequence of SELECT with no other SQL:

SELECT “VolumeID” FROM “Block” WHERE “Hash” = “04++WDfbojSWsXsKjPSmxdrOgKkD9D8W4XZaow8nhsk=” AND “Size” = 102400

A sequence of about 50 per dblock is due to default dblock size of 50MB and dedup block size of 100KB.

Reversing from SQL suggests this is m_findHashBlockCommand being run in UpdateBlock() from code:

github.com

duplicati/duplicati/blob/79d839546b238be79967bf8532267bfab2b08aa6/Duplicati/Library/Main/Operation/RecreateDatabaseHandler.cs#L461-L463


      
          // Update the block table so we know about the block/volume map

          foreach (var h in rd.Blocks)

              restoredb.UpdateBlock(h.Key, h.Value, volumeid, tr);

and (speculating further), an empty source file in the backup can cause a reference in the dlist without actually having an empty entry in the dblock (which implies it’s not in the dindex for a dblock file either).

Jumping way ahead, I speculate that this is due to incorrectly treating the empty file as loop terminator:

github.com

duplicati/duplicati/blob/79d839546b238be79967bf8532267bfab2b08aa6/Duplicati/Library/Main/Operation/Backup/StreamBlockSplitter.cs#L114-L115


      
          // Core processing loop, read blocks of data and hash individually

          while (((lastread = await stream.ForceStreamReadAsync(buf, blocksize)) != 0))

which means you wind up with no empty block put out. You do wind up with an empty blockset though.

This is reproducible. Just back up an empty source file, watch live log at Information, and do Recreate.

Lacking a full log, the way we’d have to look at this case would be to determine if we dare copy the DB currently being recreated. I’d sure hate to have recreate die due to a locking error – Windows generally allows me to leave DB Browser for SQLite read-only on a database until it needs to be deleted or such, implying that a copy here is probably also safe, however I’ll leave to @f4253a38b0 to make that call…

Finding the database-in-progress would probably mean looking in the temporary files, guessing at files, copying, and test-opening with a database browser. The suspect is a Blockset of Length 0, and further forecast of a wasted dblock loop is Hash not being in Block table at all, or being there with VolumeID -1.

I was creating some materials for a GitHub issue, but maybe I’ll at least see what I can get started here.

lengthtest - length 0 backup delete repair.zip (43.8 KB) shows the not-typical situation from empty file

lengthtest - length 1 backup delete repair.zip (45.2 KB) shows a more typical situation from 1-byte file

The test was to do the initial backup, bug report, recreate (File Explorer delete + GUI Repair), bug report. There are separate Profiling (plus --profile-all-database-queries) logs from the backup and the repair too.
A final bonus is the actual source file, plus sqldiff output showing DB differences after Recreate such as:

INSERT INTO Block(ID,Hash,Size,VolumeID) VALUES(2,‘47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=’,0,-1);
INSERT INTO BlocksetEntry(BlocksetID,“Index”,BlockID) VALUES(1,0,2);

which is how Recreate changes the DB style of the problem from missing BlocksetEntry to -1 VolumeID. Either way, empty file has made a break in the usual chain from File to BlocksetEntry to Block to Volume.

f4253a38b0 · April 23, 2019, 3:51pm

Do you need a copy while it is being created or is it ok after the fact?
And wouldn’t i submit a listing of all my backuped files this way?

ts678 · April 23, 2019, 4:08pm

The copy is for you to open and look at, as described. Ordinarily a DB bug report would sanitize the listing of files (and other things), but I doubt Duplicati will be willing to create one when DB isn’t finally settled yet.

After the fact as a DB bug report is fine, but we don’t know how far away it is, and I’m not sure how well a pixel-counting approach from 90% to 100% would go. A log from start should help, but we don’t have that.

I’m about to file a GitHub issue pointing back here. Maybe the code-talk can continue there instead of here.

EDIT: It’s Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747

saviodsouza · April 24, 2019, 4:14pm

There is a code written in python to recover from scratch and another in rust.

I do remember reading recreate database runs slower when the webui is open. You need to close the browser once you hit recreate database. Open it later to see.