Why is Fetching Path Info for Restore so slow?

Duplicati - 2.0.3.3_beta_2018-04-02
Mac OSX
According to the backup log there are about 550,000 files to examine.

When I initiate a Restore, I observe long delays to get to the file for restoring.
Fetching Path Information takes about 4 to 5 minutes. The Search for Files box is left blank.

This just gets me to the path information for the latest backup. If I then decide to click on an earlier backup date this takes another 3 to 4 minutes to get the folder tree.

Then you have to drill down the folder structure to find your file or folder for restore. Each click takes about 60 to 80 seconds to respond with the next level down. Dependent on how deep you need to drill down this can take another 10 minutes.

So we are talking about 20 minutes just to get to the file I want to restore. Plus the additional time to actually restore the file.

Why is this taking so long? In Crashplan I can drill down to the file that I want to restore instantaneously without any delays. Is this an inherent problem on how Duplicati handles the backup set or is there anything I can do to speed this up to a more usable setting?

It’s unoptimized queries in the database :frowning: And they only get worse the larger the database grows. It’s fairly manageable with 300-600MB databases usually.

We have a thread on it here Optimizing load speed on the restore page where we identified some performance problems on the restore page and ways of dealing with it.

Sadly I haven’t gotten around to implementing any of the solutions yet :frowning:

Strange, that sounds like my experience with the old 2.0.2.1 beta. Someone optimized the database queries and now it is MUCH faster. Version 2.0.3.3 beta definitely has those fixes.

Not sure why you are still seeing poor performance.

The optimizations are in all newer releases and they helped a ton, but on larger databases it has still been troublesome.

My database is 3GB and restore browsing is pretty fast.

What OS version are you on?

It’s very system dependent :slight_smile:

It’s all about CPU and disk speed since it’s 100% about how quickly your machine can do database lookups.

I just took the first backup with the latest Duplicati 2.0.4.23_beta_2019-07-14 and about 2 million files (319GB), and it’s taking minutes to browse and open each folder to even just look at what’s inside (even if the directory structure is single dirs inside single dirs until you get deeper).

Would definitely love some optimizations on this front.

The db file is 2GB btw.

What are your system specs? CPU, RAM, etc…

It’s a really beefy machine. 64GB of RAM (with lots of it free), 16 thread CPU (Linode 64GB Cloud Server Plans and Pricing - Linode).

Have you ever vacuumed your database? If not try turning on the auto-vacuum option and let it vacuum at the end of the next backup job… See if it helps…

I haven’t but considering I just took one single backup and duplicati was installed a day prior, I wouldn’t consider it an acceptable way to go.

The issue is likely a technical design flaw of some kind - lack of index or inefficient storage or queries.

Oh didnt realize you only started using it. Vacuum shouldn’t be needed.

Your experience doesn’t really mesh with mine. I have a larger dataset and larger database. But you have way more files so that’s probably the reason.

Very slow folder browsing during restore said it’s “caused by a large table of paths needing to be filtered”, which I think is referring to the File table where the Path for some file is kept – and a row created if the file changes. That’s a flat list to search through. There didn’t used to be folder organization, but there is now.

Local database is a somewhat obsolete simplified view showing how a fileset points to files which point to a set of data blocks and a set of metadata blocks. What’s changed, and might be relevant to fixing issue:

Feature/fix path storage2 #3468 made File a view and added PathPrefix and FileLookup tables to schema.

broken: v2.0.4.13-2.0.4.13_canary_2019-01-29

Changed the internal storage of paths to use a prefix method. This should reduce the size of the database significantly and enable much faster database queries later on

I didn’t write that text, but because the prefix is a folder, it possibly opens the door to faster folder opens…

Do you know if the above change made it into v2.0.4.23-2.0.4.23_beta_2019-07-14, which was slow for me?

I’m still backing up using and testing v2.0.4.21-2.0.4.21_experimental_2019-06-28, which I presume has this change, and in my initial smaller test, the file list speed was very good. I’ll know if it’s actually better than the beta in a few hours when the backup finishes.

2.0.4.23 is 2.0.4.5 plus “This update only contains warnings …” per release note, so it should not have it.

2.0.4.21 experimental is a genuine feature/bugfix update off of the canary chain 2.0.4.20 level, so should.

Thing is, it doesn’t look like the code has been changed yet to take advantage of the path prefix redesign.

One change that you might notice in backup if you have a fast enough uplink is new parallel uploaders…

v2.0.4.16-2.0.4.16_canary_2019-03-28

Added option for parallel uploads, thanks @seantempleton

Sadly, you’re right. Now that the large backup is complete, the restore list does still take as long as it did before with the beta. At least rebuilding the db on my local machine now took only 25 minutes with the experimental instead of days like with the beta.

Open issue Listing directories for restore very slow #1715 just got a note to look here. Maybe someone will volunteer to change the code to make better use of the path prefix, if the new design actually helps with this.

Could you open two web browser tabs to Duplicati…

On the first tab, select one of your backup sets and click Restore files. When you see the file file tree with C:\ (or whatever), leave it there.

Go to the second Duplicati web browser tab and go to About / Show Log / Live / and select Profiling.

Then go back to the first tab and expand C:. When it is complete, go back to the tab that shows the Live log. Near the top (second line when I tried this) it shows the total time it took. But below that it should show several lines like “… ExecuteNonQuery … took [time]” or “… ExecuteReader … took [time]”

Just curious which one of those Execute lines stands out as taking the biggest chunk of the total time.