Recreating database logic/understanding/issue/slow

ts678 · August 6, 2019, 1:19pm

Specific cases of “why is it downloading” can’t be explained without a DB bug report and much work but roughly estimating whether it’s in the 70%-80%, 80%-90%, or 90%-100% pass will give some insight on why it had to be done. The question of how the need for your dblock downloads began is hugely deeper.

You can look at history of the issue in this topic by pressing Control-F and searching for the word empty.

Searching for the word passes will find discussion of the three levels of dblock search, if any is needed.

To cover it further, v2.0.4.18-2.0.4.18_canary_2019-05-12 has a fix announced in the forum as following:

Ignoring empty remote files on restore, which speeds up recovery, thanks @pectojin

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747
is the GitHub issue on this, and probably describes the experience of many (one usually has empty files).

Check block size on recreate #3758 is the fix for this probably widespread but very specific problem case.

The fix is unfortunately not in a beta, but is in beta candidate v2.0.4.21-2.0.4.21_experimental_2019-06-28 which was not suitable for a beta due to an FTP problem. Instead, Release: 2.0.4.23 (beta) 2019-07-14 is basically 2.0.4.5 plus a warning that had to be done. Click that release announcement for more about that.

It would probably be worth trying 2.0.4.21 experimental on the separate machine for the restore test, but if installed on an existing backup system, it will upgrade databases and make it difficult to revert to 2.0.4.23.
Downgrading / reverting to a lower version covers that. Though it’s DB-centric, even systems without DB also have potential downgrade issues from design change. At least databases group issues in few spots.

Backing up the DB in a different job that runs after a source file backup would be a fine safeguard in case fixes so far (such as mentioned) still leave you in dblock downloads (last 10% may be especially lengthy).

Keeping more than one version of DB would be best because sometimes the DB self-checking at start of backups and other operations finds problems introduced in prior backup somewhere, so prior DB backup would have the problem, whereas the one before it might be good, but old DBs also tend to remove newer backup files from the remote if one runs repair – it’s never seen the files – fix for issue is being discussed, and repair/recreate is being entirely and very slowly redesigned anyway, so I don’t know what future holds.

I’d note that some of the backup checking is between the DB and the remote, e.g. is everything still there, with expected content that hasn’t been corrupted on upload or on remote? Things do corrupt sometimes.

Keeping duplicate records has advantages over single-copy records, but it does lead to messages about unexpected differences. It also requires reconstruction of the duplicate records (i.e. the DB) if they’re lost. The flip side of that is that lost remote dlist and dindex files can be recreated from the database’s records.

Local DB is a tradeoff IMO, with pros and cons and (for the time) beta bugs that need to be shaken out…

Another tradeoff IMO is the slicing and tracking that any block-based deduplicating backup has to do, but direct copying of source files (which some people do feel more comfortable with) is just hugely inefficient.

Be super-sure never to have two machines backing up to the same destination, doing repairs, etc. Each will form the remote into what it thinks is right, and they will step all over each other. Direct restore is OK.

For best certainty on test restore from machine to itself, add –no-local-blocks to its backup configuration.

I’m not clear on all the machines being used, but I think Linux backup needs similar UNIX restore system.

archon810 · August 6, 2019, 9:50pm

You know, I did read about Check block size on recreate by Pectojin · Pull Request #3758 · duplicati/duplicati · GitHub, but I assumed that since I’m using the latest beta with a version higher than when it was merged, that the fix was already in the beta, but I see now I was wrong. I’ll upgrade to the experimental version and try a restore to another machine again.

I’ll also try another suggestion from here and up the internal block size Logic and usecase for remote subfolders - #14 by kees-z.

Yesterday, I imported the profile I was trying to restore into my local Windows desktop and tried a db rebuild. To my surprise, it reached the same 2GB db size in sqlite relatively quickly, however, the backup did not begin at this point, and I saw no physical changes to the db size or the restore (the restored files never started appearing) - instead, the job kept doing something (I think downloading dblocks and very slowly doing some queries) and I ran out of patience and killed it this morning. What was it doing? I’m not sure, but it was definitely taking entirely too long to begin doing anything.

After my frustrations with duplicati, I also installed and compared it to duplicacy cli. While duplicacy web gui was kind of broken, I found that with cli, both backups and restores were robust and fast, so there’s a good benchmark for any future duplicati performance for me.

Be super-sure never to have two machines backing up to the same destination, doing repairs, etc.

Yeah, I didn’t set up any schedules or started doing any backups, only restores. In both cases a partial db restore from a direct restore and a full db rebuild from a configuration restore ended up extremely slow and never finished building the database in a reasonable amount of time.

For best certainty on test restore from machine to itself, add –no-local-blocks to its backup configuration.

And probably –no-local-db as well, right?

I’m not clear on all the machines being used, but I think Linux backup needs similar UNIX restore system.

I think what I found during a smaller test restore of a Linux backup to a Windows machine was that the Linux symlinks didn’t get restored (as opposed to maybe restoring files they point to in place of symlinks). I didn’t expect that part and permissions/ownership to work, so I didn’t look into whether there are special restore flags related to such a Linux → Windows scenario.

ts678 · August 7, 2019, 2:24am

Maybe. I’m not familiar with that one. I see descriptions making it sound like this is like partial DB recreate done for direct restore, which is a good test for disaster recovery. For seeing if restores using local DB is working (i.e. the non-disaster case), you would not want this switch because the local DB then isn’t used.

I might be thinking of someone who wanted to move a file tree to a different OS and continue its backup.
Message in code below looks like restore has tolerance for differences in OS, which is probably good…

github.com

duplicati/duplicati/blob/4e1f94bd3479ec10ef812f005266c70d3c6da37b/Duplicati/Server/webroot/ngax/scripts/controllers/RestoreController.js#L304-L310


      
          if ($scope.RestoreLocation != 'custom' && dirsep != $scope.SystemInfo.DirectorySeparator)
          {
              DialogService.confirm(gettextCatalog.getString('This backup was created on another operating system. Restoring files without specifying a destination folder can cause files to be restored in unexpected places. Are you sure you want to continue without choosing a destination folder?'), function(ix) {
                  if (ix == 1)
                      $scope.onStartRestoreProcess();
              });
          }

archon810 · August 7, 2019, 5:07am

I’m back after trying 2.0.4.21 (2.0.4.21_experimental_2019-06-28) as opposed to 2.0.4.23_beta_2019-07-14, blowing up my backups, and setting them up with 10MB internal blocks instead of 100KB.

Now this time, a restore of a remote Linux test backup of 15545 files (21.97 GB) went to my local Windows desktop without a db extremely fast. Granted, this is a test set and the real set of 2mln files and 400GB is still backing up, but the db repair/recreation step alone was taking ages before and now seemed to complete extremely fast: only 30 seconds.

The db itself is also a lot smaller thanks to a larger internal block size, which helps the speed tremendously.

I’ll be back with the results of the full restore when it completes, but things are looking a lot more promising with the fixes in experimental and this larger block size, at least in my case.

Now to get these fixes to beta and stable…

archon810 · August 8, 2019, 12:35am

The large backup finally finished, and when I tried to recreate the db locally from the profile, this time it took only 25 minutes with the experimental version, whereas the beta was taking so long, I didn’t think it’d finish after a day.

I will try restoring both locally and to the same server next.

ts678 · August 8, 2019, 12:50am

Did you notice if it was downloading large numbers of dblocks (either from logs or looking at progress bar?

Possibly the fix for the presumably common empty-source-file bug was enough (but it won’t always be…).

Bobblehead · September 13, 2019, 6:51pm

I can confirm the improvement in database rebuild performance. Using Beta 2.0.4.23 on Windows the rebuild job was aborted after more than 8 hours. Exact same rebuild using Canary 2.0.4.28 took 29 minutes. Job was a cloud backup of just under 100GB of files using 100MB block size.

Sami_Lehtinen · September 14, 2019, 1:42pm

I’ve got similar experiences about recreate and even the worst part of it, which is failure to recreate. But that story is in the different thread, but highly correlated with everything in this thread.

Personally I see that the only major fallback with current Duplicati. Slow rebuild would be acceptable in disaster recovery situations. But failure shouldn’t be an option.

Everything else is already working well enough that I’m happy about it. But this is the thing which make me sweat at night. Because I could just find any backup to be un-restorable at random after extremely slow database restore process. But I did see from some other thread that the rebuild / recreate task is being improved, which hopefully solves this issue.

sfatula · November 10, 2019, 6:03am

I was having the recreate issue as well on ubuntu 18. So, I installed the 2.0.4.34 canary, and, it was able to recreate it. Thanks for whomever first posted this. I had not seen a linux confirmation of this technique yet. Backend is B2.

jcsogo · November 12, 2019, 6:39pm

I had to recreate my database, and was suffering the same issue described above with the version 2.0.4.23_beta_2019-07-14. I have installed 2.0.4.34 canary and the reconstruction of the database of a 175GB backup stored in the cloud has been performed in less than 15 minutes.

Catfriend1 · May 7, 2021, 7:09am

My database got corrupted because of “disk full” (my fault). I’ve thrown it away and hit the repair button on the web UI. It’s been running now for ~ 20 hours and about 70% completion is shown on the UI. The backup size is ~ 800 GByte. Is this a normal speed? Duplicati seems to read every file of the backup set as my GBit network adapter shows full congestion to the backup storage on the local network. It’s running within a 4 cpu core debian 10.9 virtual machine and according to htop uses one cpu core 100%.

Can I do anything to speed it up?

drwtsn32 · May 7, 2021, 1:09pm

Check About → Show Log → Live → Verbose and watch for new events there. Eventually you should see a message along the lines of “processing X of Y”. What are those X and Y numbers? And is it processing dblock files?

My understanding is that older versions of Duplicati may write dindex files incorrectly on the back end, and during a database recreation this is detected. Duplicati is forced to download some/all dblock files in those situations. A potentially very slow process. (Normally if the dlist and dindex files are correct, no dblock files need to be downloaded during a database recreation.)

If you are on the new beta version, there is a way to fix the root problem (incorrectly written dindex files) but unfortunately it requires as functioning database first. So you really do need to let this process finish.

Catfriend1 · May 7, 2021, 2:47pm

I currently see this:

So, yes, it’s reading the dblock files. I’m still waiting for the “Processing X of Y” line to appear. Machine is still quite busy.

Network activity has clamed down.

Still running:

I noticed it’s still heavily reading the disk, but the local disk only contains the database files of Duplicati.

Update a while later: The rebuild has now successfully finished after ~ 1 day 16 hours. I did not understand why it first read all files and then only hour-long lasting disk read activity on the drive where the database resides.