Restore: Why is loading directories for restore so impossibly slow?

I love the concepts behind Duplicati, but I find it impossible to restore files from my backup. I installed it and ran my first backup yesterday, consisting of 400G containing over 5,000,000 files, most of them very small. The backup destination is B2. Yes, the number of small files would present a challenge for any backup solutions; you either pay (in time) at the backup stage (for indexing in the DB), or at restoration (for finding what you need to restore). The first stage was great. The backup ran flawlessly, and was faster than expected, much faster than MSP306 and similar products.

But evaluating backup software wouldn’t be complete without trying a restore operation, and it required about five minutes to load the /home directory, and another 5 minutes to browse lower than that. The source system is an Ubuntu server on AWS, with 8 vCPUs, 32G memory, and 1TB NVMe SSD storage, which should be sufficient power to manage the database operations required to display the directory hierarchy. But I can’t do it. In an emergency you really need to do restores ASAP, but it seems this is not possible with Duplicati. Or am I missing something? Would it run faster if I did not use encryption?

Welcome to the forum @Richard_Gerber

I don’t think this affects file listing speed, which I think is all database queries. It might speed other work.
You can see if any of the remote files are being accessed via About → Show log → Live → Information

Can you give an idea of how many are in a folder? Your slowness is in opening specific folders, correct?

One couldn’t reasonably ask for more than yours, but that isn’t proof that it’s sufficient. Can you examine?
This gets kind of complicated, and I’m not expert. I don’t know how good your SQL is, so begin with timing.

Duplicati allows logging of individual SQL statements, and how long each takes. It can make a very big log, or alternatively you can watch the server log at About → Show log → Live → Profiling as it hits a slow spot.

If you see a slow SQL statement, you can note that, but it’s not always clear (especially without good SQL expertise available – any SQL experts out there?) exactly how to go about improving a slow SQL strategy.
Certainly it could be something else, but looking at SQL time is much easier than other things you can do.

Here’s what my output looked like in my usual huge log, kept to help debug in case something goes wrong:

2021-03-08 10:23:24 -05 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation List has started
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Controller-RunList]: Starting - Running List
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Operation" ("Description", "Timestamp") VALUES ("List", 1615217004); SELECT last_insert_rowid();
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Operation" ("Description", "Timestamp") VALUES ("List", 1615217004); SELECT last_insert_rowid(); took 0:00:00:00.113
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: CREATE TEMPORARY TABLE "Filesets-CFAAD7436E4696408CE483765485681D" AS SELECT DISTINCT "ID" AS "FilesetID", "IsFullBackup" AS "IsFullBackup" , "Timestamp" AS "Timestamp" FROM "Fileset"  WHERE  "Timestamp" <= 1615214451
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: CREATE TEMPORARY TABLE "Filesets-CFAAD7436E4696408CE483765485681D" AS SELECT DISTINCT "ID" AS "FilesetID", "IsFullBackup" AS "IsFullBackup" , "Timestamp" AS "Timestamp" FROM "Fileset"  WHERE  "Timestamp" <= 1615214451 took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: CREATE INDEX "Filesets-CFAAD7436E4696408CE483765485681D_FilesetIDTimestampIndex" ON "Filesets-CFAAD7436E4696408CE483765485681D" ("FilesetID", "Timestamp" DESC)
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: CREATE INDEX "Filesets-CFAAD7436E4696408CE483765485681D_FilesetIDTimestampIndex" ON "Filesets-CFAAD7436E4696408CE483765485681D" ("FilesetID", "Timestamp" DESC) took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT DISTINCT "A"."FilesetID", "A"."IsFullBackup", "B"."FileCount", "B"."FileSizes" FROM "Filesets-CFAAD7436E4696408CE483765485681D" A LEFT OUTER JOIN ( SELECT "A"."FilesetID" AS "FilesetID", COUNT(*) AS "FileCount", SUM("C"."Length") AS "FileSizes" FROM "FilesetEntry" A, "File" B, "Blockset" C WHERE "A"."FileID" = "B"."ID" AND "B"."BlocksetID" = "C"."ID" AND "A"."FilesetID" IN (SELECT DISTINCT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") GROUP BY "A"."FilesetID"  ) B ON "A"."FilesetID" = "B"."FilesetID" ORDER BY "A"."Timestamp" DESC 
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT DISTINCT "A"."FilesetID", "A"."IsFullBackup", "B"."FileCount", "B"."FileSizes" FROM "Filesets-CFAAD7436E4696408CE483765485681D" A LEFT OUTER JOIN ( SELECT "A"."FilesetID" AS "FilesetID", COUNT(*) AS "FileCount", SUM("C"."Length") AS "FileSizes" FROM "FilesetEntry" A, "File" B, "Blockset" C WHERE "A"."FileID" = "B"."ID" AND "B"."BlocksetID" = "C"."ID" AND "A"."FilesetID" IN (SELECT DISTINCT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") GROUP BY "A"."FilesetID"  ) B ON "A"."FilesetID" = "B"."FilesetID" ORDER BY "A"."Timestamp" DESC  took 0:00:00:00.023
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: CREATE TEMPORARY TABLE "Filenames-0D9624F40D895144815C696CC3DDA905" ("Path" TEXT NOT NULL)
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: CREATE TEMPORARY TABLE "Filenames-0D9624F40D895144815C696CC3DDA905" ("Path" TEXT NOT NULL) took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: INSERT INTO "Filenames-0D9624F40D895144815C696CC3DDA905" SELECT DISTINCT "Path" FROM "File" WHERE "Path" LIKE "C:\PORTABLEAPPS\%"
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: INSERT INTO "Filenames-0D9624F40D895144815C696CC3DDA905" SELECT DISTINCT "Path" FROM "File" WHERE "Path" LIKE "C:\PORTABLEAPPS\%" took 0:00:00:00.007
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: DELETE FROM "Filenames-0D9624F40D895144815C696CC3DDA905" WHERE "Path" NOT IN (SELECT DISTINCT "Path" FROM "File", "FilesetEntry" WHERE "FilesetEntry"."FileID" = "File"."ID" AND "FilesetEntry"."FilesetID" IN (SELECT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") ) 
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: DELETE FROM "Filenames-0D9624F40D895144815C696CC3DDA905" WHERE "Path" NOT IN (SELECT DISTINCT "Path" FROM "File", "FilesetEntry" WHERE "FilesetEntry"."FileID" = "File"."ID" AND "FilesetEntry"."FilesetID" IN (SELECT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") )  took 0:00:00:00.036
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: CREATE TEMPORARY TABLE "Filenames-271A73A35C151A4E9FCB8667C4270F6B" ("Path" TEXT NOT NULL)
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: CREATE TEMPORARY TABLE "Filenames-271A73A35C151A4E9FCB8667C4270F6B" ("Path" TEXT NOT NULL) took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT DISTINCT "Path" FROM "Filenames-0D9624F40D895144815C696CC3DDA905" 
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT DISTINCT "Path" FROM "Filenames-0D9624F40D895144815C696CC3DDA905"  took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: CREATE INDEX "Filenames-271A73A35C151A4E9FCB8667C4270F6B_PathIndex" ON "Filenames-271A73A35C151A4E9FCB8667C4270F6B" ("Path")
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: CREATE INDEX "Filenames-271A73A35C151A4E9FCB8667C4270F6B_PathIndex" ON "Filenames-271A73A35C151A4E9FCB8667C4270F6B" ("Path") took 0:00:00:00.000
2021-03-08 10:23:24 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT "C"."Path", "D"."Length", "C"."FilesetID" FROM (SELECT "A"."Path", "B"."FilesetID" FROM "Filenames-271A73A35C151A4E9FCB8667C4270F6B" A, (SELECT "FilesetID", "Timestamp" FROM "Filesets-CFAAD7436E4696408CE483765485681D" ORDER BY "Timestamp" DESC) B ORDER BY "A"."Path" ASC, "B"."Timestamp" DESC) C LEFT OUTER JOIN (SELECT "Length", "FilesetEntry"."FilesetID", "File"."Path" FROM "Blockset", "FilesetEntry", "File" WHERE "File"."BlocksetID" = "Blockset"."ID" AND "FilesetEntry"."FileID" = "File"."ID" AND FilesetEntry."FilesetID" IN (SELECT DISTINCT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") ) D ON "C"."FilesetID" = "D"."FilesetID" AND "C"."Path" = "D"."Path"
2021-03-08 10:23:24 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: ExecuteReader: SELECT "C"."Path", "D"."Length", "C"."FilesetID" FROM (SELECT "A"."Path", "B"."FilesetID" FROM "Filenames-271A73A35C151A4E9FCB8667C4270F6B" A, (SELECT "FilesetID", "Timestamp" FROM "Filesets-CFAAD7436E4696408CE483765485681D" ORDER BY "Timestamp" DESC) B ORDER BY "A"."Path" ASC, "B"."Timestamp" DESC) C LEFT OUTER JOIN (SELECT "Length", "FilesetEntry"."FilesetID", "File"."Path" FROM "Blockset", "FilesetEntry", "File" WHERE "File"."BlocksetID" = "Blockset"."ID" AND "FilesetEntry"."FileID" = "File"."ID" AND FilesetEntry."FilesetID" IN (SELECT DISTINCT "FilesetID" FROM "Filesets-CFAAD7436E4696408CE483765485681D") ) D ON "C"."FilesetID" = "D"."FilesetID" AND "C"."Path" = "D"."Path" took 0:00:00:00.240
2021-03-08 10:23:25 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: DROP TABLE IF EXISTS "Filenames-0D9624F40D895144815C696CC3DDA905" 
2021-03-08 10:23:25 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: DROP TABLE IF EXISTS "Filenames-0D9624F40D895144815C696CC3DDA905"  took 0:00:00:00.000
2021-03-08 10:23:25 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: DROP TABLE IF EXISTS "Filenames-271A73A35C151A4E9FCB8667C4270F6B"
2021-03-08 10:23:25 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: DROP TABLE IF EXISTS "Filenames-271A73A35C151A4E9FCB8667C4270F6B" took 0:00:00:00.000
2021-03-08 10:23:25 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: DROP TABLE IF EXISTS "Filesets-CFAAD7436E4696408CE483765485681D" 
2021-03-08 10:23:25 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: DROP TABLE IF EXISTS "Filesets-CFAAD7436E4696408CE483765485681D"  took 0:00:00:00.000
2021-03-08 10:23:25 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: PRAGMA optimize
2021-03-08 10:23:25 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: PRAGMA optimize took 0:00:00:00.000
2021-03-08 10:23:25 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Controller-RunList]: Running List took 0:00:00:00.960

Search for took, and see how long yours take. Note logging can show path info, for example you can see that PortableApps folder got expanded by me for a look. It’s a tiny folder, so yours will certainly take longer.

If live log isn’t sufficient, and you pursue log-file=<path> log-file-log-level=profiling, you can also set option ---profile-all-database-queries which sometimes gets low-level details that are otherwise omitted.

1 Like

Thanks for your response. I checked out the log and you suggested, but the detailed profiling data was not present. (Is this something I need to specify in the configuration?) Anyway, I found the database in /usr/lib/duplicati/data, and sure enough, the File table does contain close to 7,000,000 rows. I ran a smaller backup with only 30,000 files, and the file browsing was quick, as you would expect it to be. But for the large one, I have to wait several minutes before the “/” directory appears, and then if I try to expand /var, it takes another 5 minutes. If I open /var/www, it requires 4 minutes; /var/www/html takes 2 minutes, and so on. There are millions of leaves in the directory tree. I will test command line restores, because maybe this is just an issue of presenting the directories in a browsable form. Richard.

Did you get anything? If you omit log-file-log-level=profiling, I think it does Warnings and above.

I’m not an expert in the algorithm or its SQL, but speed might depend on files per folder, not the total files.

Could be. I don’t know, but I do see (in Wireshark) Duplicati give the entire file list to my browser to show.

1 Like

I love the simple interface of Duplicati, but I suppose it is not the optimal solution for me. I have directories that include thousands of small files, and as I pointed out there are over 6,500,000 files overall in about 400G. I ran a trial of the SolarWinds product that was way slower than Duplicati on the backup part, but extremely fast for restoration, which is what I need. CloudBerry isn’t as fast as SolarWinds, but it is extremely fast compared to Duplicati. The backups for Cloudberry have a substantially larger footprint, about as large as the filesystem being backed up, which is sustainstial, considering that most of my files can be compressed significantly. I suppose they use this space for indexing. Perhaps Duplicati isn’t built for backup situations like mine, with large directory stuctures and the need for quick restoration of single files. Backup storage is cheap, and maybe a backup program should use as much of it as is necessary to do fast lookups on restores. Rich.

If Duplicati reads a lot of info into memory, the slower it will get as number of entries increase.

It should be able to be optimized differently if they are loading everything into memory immediately. Instead they could load into memory only when a folder is expanded in the restore process or whatever.

But, I haven’t really looked at that part of Duplicati and just throwing this out there as that’s what sounds like is happening.

As an example, they only need to load the main folder’s names. One click on a folder and they only need to load the folders and files immediately there, nothing else. That is really fast.

They do need to find that stuff in the archive or db. But that should also be fast if they can just get everything at said path and the latest info to go with it. A db setup for it should make it really fast.

If they were instead hunting all around the backup files or loading everything into memory it would be quite slow for sure.

You are right, but since Duplicati is open source and freeware, we can’t really complain. It worked much faster for smaller backups. When I tried to backup parts of the filesystem (with 500,000 files each), it worked fine. So to backup 5,000,000 files, it would suffice to create 10 separate backups with smaller trees for each. I actually did something like that, and it worked fine. So if Duplicati would handle large backups like that, chopping them into parts, it would work fine. For me, it is too much of a pain to manage backups like that, but it does give some clues about strategies that could be used to handle backups like mine. It is a shame that I don’t have time to check out the code.

There is room for improvement (performance) with some of the database queries. Duplicati can slow down because of these suboptimal queries when the number of blocks gets too large, and maybe as you’ve demonstrated, the number of files is really large.

Someone with strong SQL query optimization skills could help improve this, if they are willing. :slight_smile:

Here’s a recent change that improved some database operations significantly. Not sure if it applies to OP’s issue.

I’m still looking for SQL profiling times. Based mostly on name and SQL, I’m guessing the code is here:

and even with my tiny folder expansion logged earlier, this was the slowest SQL. Let me pretty-print that:

SELECT "C"."Path"
	,"D"."Length"
	,"C"."FilesetID"
FROM (
	SELECT "A"."Path"
		,"B"."FilesetID"
	FROM "Filenames-19C57E44B990C0428B3C69351597D57A" A
		,(
			SELECT "FilesetID"
				,"Timestamp"
			FROM "Filesets-10AA35FC3949D041B20F0B443D64DA0F"
			ORDER BY "Timestamp" DESC
			) B
	ORDER BY "A"."Path" ASC
		,"B"."Timestamp" DESC
	) C
LEFT OUTER JOIN (
	SELECT "Length"
		,"FilesetEntry"."FilesetID"
		,"File"."Path"
	FROM "Blockset"
		,"FilesetEntry"
		,"File"
	WHERE "File"."BlocksetID" = "Blockset"."ID"
		AND "FilesetEntry"."FileID" = "File"."ID"
		AND FilesetEntry."FilesetID" IN (
			SELECT DISTINCT "FilesetID"
			FROM "Filesets-10AA35FC3949D041B20F0B443D64DA0F"
			)
	) D ON "C"."FilesetID" = "D"."FilesetID"
	AND "C"."Path" = "D"."Path"

I think that query gets assembled in code below. I traced some but not all. Seeing “cartesian” worries me.

Documentation for the local database format. Any SQL experts out there who might have any opinions?

If this is a direct restore from backend files, there’s a problem where some files are downloaded twice.

I doubt it. That would use a temporary database (probably in a dup-* file in /tmp).
This one looks like a regular restore, although seemingly –portable-mode is used.
That can be seen by the database being in the data folder under the install folder:

I guess we could ask @Richard_Gerber if the disaster recovery method mentioned here was used:

Restoring files if your Duplicati installation is lost

To me, this sounds like a slowness in opening folders (with lots of files) in the GUI restore tree, which probably happens after the dlist files are turned into selector dropdown of versions (maybe partials).

I did a test with Duplicati 2.0.5.1 and one million (small) files. I broke the files up into 30 folders, each with about 33k files.

Did a backup, then went to do a restore. Expanding one of the folders in the restore UI took 45 seconds. Watching task manager, I could see that Duplicati was busy about 5 seconds of that, and the remaining 40 seconds was the browser (Edge) presumably working to update the directory listing display.

Upgraded Duplicati to 2.0.5.113. Ran another backup (so the job database would be upgraded). Re-ran the test and the timings were about the same.

Installed Firefox and re-ran the test. Browser time was about 25 seconds instead of 40 seconds with Edge, but memory usage was higher.

The main issue seems to be with the browser UI trying to display tens of thousands of files in a single folder.

Same initial conclusion here, testing with 10,000 files that I had laying around. Put them in 2 subfolders.
Expanding a subfolder does SQL, but finishes it in half a second or so. List to browser about 1 second.
Remaining 16 seconds or so seem to be rendering. Meanwhile, some other JavaScript does usual poll.
Testing on Firefox. Closing and opening the same folder is fast. Can get slow by starting Restore again.

Anybody good at profiling or somehow measuring where the slow spots are in such displays? I’m not…
I should note it’s not a pure HTML render (I think). Below is a sample of what the data to browser is like:

    {
      "Path": "C:\\VirtualBox shared folder\\emptyfiles\\sub2\\9998",
      "Sizes": [
        0
      ]
    },
    {
      "Path": "C:\\VirtualBox shared folder\\emptyfiles\\sub2\\9999",
      "Sizes": [
        0
      ]
    }
  ]
}

EDIT:

The data sent to the browser was only on the subfolder expanded, so I’m not sure the total files matters:

Yet the above test seems to say it does, although I’m not sure if the individual folder sizes shrank as well.

Although the easiest way to let us know how many files per folder you’re facing might be to look at source, database can also be used. PathPrefix table will be folders. Find yours, and take ID to FileLookup table to enter it in PrefixID filter field, e.g. in DB Browser for SQLite, I would enter =3 to count my files in sub2.

image

I was restoring from the first backup, immediately after the backup was done; hence whatever artifacts made on my server were still there. It took several minutes just to browse the top levels of the directory. Remember, there were 7,000,000 files spread all over the place. I was restoring from a backup on S3 standard (not glacier or any of the low cost options), but gave up after a while.

Just to ensure that it wasn’t just a browser issue, I also tried doing a manual restore from the command line, on several files deep in the directory hierarchy. It eventually restored the files into /tmp, after about 1/2 hour of waiting.

As for browser issues, when I use a file browser through Webmin, for example, there is zero delay opening a directory, even in my overpopulated filesystem. So Duplicati could do do something similar, perhaps with an ajax call to open up a directory. Of course I don’t know how the program works, since I am just a user at this point. But restructuring the database seems that it would be a large undertaking.

Here goes:

sqlite> select count(*) from PathPrefix;
404591
sqlite> select count(*) from File;
6529739

PS: I would really like to see a head-to-head comparison between Duplicati and Cloudberry. Backup speed on Duplicati is the winner, but not on restore. But I didn’t profile any of this, which is really what is necessary.

Could you please say how many files are in the folders you expanded? Do you have a file lister, even ls? Using a non-GUI sqlite tool might make it hard to browse. In a browser, just type path into Prefix column.

The total file count was already given. The most interesting thing from PathPrefix count is there are lots of folders. I can compute an average count per folder, but I’m most interested in the folders expanding slowly.

To say more about data sent to the browser, it begins like the below, and I’m not going to post all of middle.

...{
  "prefix-only": "false",
  "folder-contents": "true",
  "time": "2021-03-09T21:03:56-05:00",
  "filter": "@C:\\VirtualBox shared folder\\emptyfiles\\sub2\\",
  "Filesets": [
    {
      "Version": 0,
      "IsFullBackup": 1,
      "Time": "2021-03-09T21:03:56-05:00",
      "FileCount": 20000,
      "FileSizes": 0
    }
  ],
  "Files": [
    {
      "Path": "C:\\VirtualBox shared folder\\emptyfiles\\sub2\\0000",
      "Sizes": [
        0
      ]
    },
    {
      "Path": "C:\\VirtualBox shared folder\\emptyfiles\\sub2\\0001",
      "Sizes": [
        0
      ]
    },

Query was:

GET /api/v1/backup/5/files/C%3A%5CVirtualBox%20shared%20folder%5Cemptyfiles%5Csub2%5C?prefix-only=false&folder-contents=true&time=2021-03-09T21%3A03%3A56-05%3A00&filter=%40C%3A%5CVirtualBox%20shared%20folder%5Cemptyfiles%5Csub2%5C HTTP/1.1

which possibly means code was:

and then there’s more processing. I don’t really do JavaScript, so I won’t attempt to interpret what it does.

If you or any other contributor wants to analyze this further, I am happy to share the database. It is just 3.3G .

Creating a bug report explains how to share a sanitized version. Problem is that that won’t identify which folder expanded slowly for you (because names are sanitized). Or is expand slow even on small folder?

The other reason a database by itself is not that useful is that it’s not enough to get into the Restore tree. There are several of us trying to match your result, but we need hints on what it is that you’re expanding.

@drwtsn32 maybe has more and larger backups than I do. Do you have any with a big total file count? Meanwhile maybe we can hear more on whether this is a large-folder or large-total-files expansion issue. Browser-level slowdown would probably be based on folder size. Large total files might stress SQL time.

And that reminds me that we’re still looking for profiling log examination that shows SQL statement times.

No, nothing that large. My largest backup set is only about 100,000 files.

I used a powershell script on a test system to create the million small files.

I am thinking the same thing.