Why is Fetching Path Info for Restore so slow?

I just took the first backup with the latest Duplicati 2.0.4.23_beta_2019-07-14 and about 2 million files (319GB), and it’s taking minutes to browse and open each folder to even just look at what’s inside (even if the directory structure is single dirs inside single dirs until you get deeper).

Would definitely love some optimizations on this front.

The db file is 2GB btw.

What are your system specs? CPU, RAM, etc…

It’s a really beefy machine. 64GB of RAM (with lots of it free), 16 thread CPU (Linode 64GB Cloud Server Plans and Pricing - Linode).

Have you ever vacuumed your database? If not try turning on the auto-vacuum option and let it vacuum at the end of the next backup job… See if it helps…

I haven’t but considering I just took one single backup and duplicati was installed a day prior, I wouldn’t consider it an acceptable way to go.

The issue is likely a technical design flaw of some kind - lack of index or inefficient storage or queries.

Oh didnt realize you only started using it. Vacuum shouldn’t be needed.

Your experience doesn’t really mesh with mine. I have a larger dataset and larger database. But you have way more files so that’s probably the reason.

Very slow folder browsing during restore said it’s “caused by a large table of paths needing to be filtered”, which I think is referring to the File table where the Path for some file is kept – and a row created if the file changes. That’s a flat list to search through. There didn’t used to be folder organization, but there is now.

Local database is a somewhat obsolete simplified view showing how a fileset points to files which point to a set of data blocks and a set of metadata blocks. What’s changed, and might be relevant to fixing issue:

Feature/fix path storage2 #3468 made File a view and added PathPrefix and FileLookup tables to schema.

broken: v2.0.4.13-2.0.4.13_canary_2019-01-29

Changed the internal storage of paths to use a prefix method. This should reduce the size of the database significantly and enable much faster database queries later on

I didn’t write that text, but because the prefix is a folder, it possibly opens the door to faster folder opens…

Do you know if the above change made it into v2.0.4.23-2.0.4.23_beta_2019-07-14, which was slow for me?

I’m still backing up using and testing v2.0.4.21-2.0.4.21_experimental_2019-06-28, which I presume has this change, and in my initial smaller test, the file list speed was very good. I’ll know if it’s actually better than the beta in a few hours when the backup finishes.

2.0.4.23 is 2.0.4.5 plus “This update only contains warnings …” per release note, so it should not have it.

2.0.4.21 experimental is a genuine feature/bugfix update off of the canary chain 2.0.4.20 level, so should.

Thing is, it doesn’t look like the code has been changed yet to take advantage of the path prefix redesign.

One change that you might notice in backup if you have a fast enough uplink is new parallel uploaders…

v2.0.4.16-2.0.4.16_canary_2019-03-28

Added option for parallel uploads, thanks @seantempleton

Sadly, you’re right. Now that the large backup is complete, the restore list does still take as long as it did before with the beta. At least rebuilding the db on my local machine now took only 25 minutes with the experimental instead of days like with the beta.

Open issue Listing directories for restore very slow #1715 just got a note to look here. Maybe someone will volunteer to change the code to make better use of the path prefix, if the new design actually helps with this.

Could you open two web browser tabs to Duplicati…

On the first tab, select one of your backup sets and click Restore files. When you see the file file tree with C:\ (or whatever), leave it there.

Go to the second Duplicati web browser tab and go to About / Show Log / Live / and select Profiling.

Then go back to the first tab and expand C:. When it is complete, go back to the tab that shows the Live log. Near the top (second line when I tried this) it shows the total time it took. But below that it should show several lines like “… ExecuteNonQuery … took [time]” or “… ExecuteReader … took [time]”

Just curious which one of those Execute lines stands out as taking the biggest chunk of the total time.

Anyone has an idea why Windows-based installation performance is WAY faster to browse than my smaller Linux-based installation?

  • Loading a folder’s content on the Windows based setup (500GB) takes less than a second.
  • Loading a folder’s content on the Linux based setup (100GB) takes about 30 seconds avg.

Anyone has an idea? I can’t find a specific issue about that so I might have to open my own. Thanks

How do the CPUs compare between the two machines? Also, what version of Duplicati are you using on both?

Linux Duplicati Config:

  • Duplicati Canary (2.0.5.103_canary_2020-02-18)
  • 15-20% CPU while browsing restore
    • Takes about 40 seconds to get the root restore directory tree loaded and ready
    • Takes about 20 seconds to expand a directory in the tree and get it loaded and ready
  • 25-30% CPU while backing up

What I realized while testing is that this backup has a much more complex directory structure. It contains hundreds of directories as it’s a backup of my developer workstation. I read in this thread that expanding a node in the tree needs to parse the whole directory list each time making it slower on such complex directory configuration. So that’d explain my issue IMO


Windows Duplicati Config:

  • Duplicati Beta (2.0.5.1_beta_2020-01-18)
  • 2-5% CPU while browsing restore
  • Takes about 2 seconds to get the root restore directory tree loaded and ready
  • Takes about 2 seconds to expand a directory in the tree and get it loaded and ready
  • 10-15% CPU while backing up

Yep, I would say that’s correct.

In addition to CPU usage, I was curious about CPU model. This is largely a CPU-intensive task, and I believe single-threaded like most (all?) sqlite queries. So single-threaded CPU performance will also affect this.

Just FYI:

Windows box CPU (the one that has the fastest Restore browse) is a Intel Core i5 4670 Quad Core 3.4GHZ Processor LGA1150 Haswell 6MB Cache.

Linux laptop (workstation) has a 8th Generation Intel® Core™ i7-8550U Processor (8M Cache, up to 4.0 GHz).

Anyway in my book, performance is very acceptable. I used to be a Crashplan client before they messed my whole account up and their app was MUCH slower for browsing. So yay for Duplicati and open source software :stuck_out_tongue:

I’m also a Crashplan refugee! Welcome to the forum :slight_smile:

1 Like

Awesome! What backend do you use? Backblaze, Wasabi, S3?
I’m currently trying Wasabi.

I use B2. I also tried Wasabi and had no complaints. B2 offers the ability to ship a bucket snapshot on USB drive, which I thought might be a nice option for faster disaster recovery.