I’m left feeling like I made a mistake with choosing Duplicati. I am trying to check my backups via the webUI to make sure I am not forgetting about a file inside of a folder I accidentally deleted but it’s currently at 1 hour 39 minutes as of writing and still “Fetching path information …” I can only tell it’s still working by checking ‘htop’ and seeing CPU activity from duplicati. Is there anyway to speed this up?
System
Arch Linux
Samsung Chromebox 3 running coreboot
USB 3.0 card to pci-e for external drives
mSATA SSD 512GB internal storage
Intel(R) Core™ i5-2450M CPU
16GB RAM
Hi @ShapeShifter499 that seems like an excessive amount of time for a simple search.
What size is the local database?
How many files (roughly) do you have in the database?
Is the hanging screen the initial one, or did you see some folders and then it crashed?
If you have a screenshot, that would help in pinpointing the bottleneck.
It eventually loads a list, but it is appearing to take multiple hours to sort through files.
In the WebUI, I go to home → job → operations → restore file. It will bring me to restore files but if I don’t remember the file exactly (name or otherwise) it could take a really long time to check each folder I knew it was in per backup (57 versions in my current list over two years).
Source:
2.83 TB
Backup:
11.23 TB / 57 Versions
One of the most recent successful backups in my logs shows this
I should add that it’s taking hours to load the initial list and every time I click on a folder in the list.
If this were a bare metal file system and a simple file explorer, that action would take seconds, a minute at worst on most of my computers. I feel like if all I am doing is searching up files and names it should not take hours to do so. Only when I try to restore a file should it take any significant time.
Thanks, that gives an idea. The numbers only mentions what has been processed in that backup, not the full backup number of files (I would neeed NotProcessedFiles as well to calculate total = Examined + Not processed.
For your use-case (finding a missing file) we do not currently have a great UI for that.
What you can do instead, is use the “commandline” feature of the UI:
In there you can choose “find” as the operation, leave the “Target URL” as-is, and then type in the filename you are looking for, with */ before and * after. Finally set the option --all-versions=true to get a search across versions:
Yes. The expression you type in as the “commandline arguments” is a filter expression, so it can match a folder as well (technically all the files in the folder are matched):
/path/to/folder/*
If you have a terminal, you may want to avoid the UI, and use the real commandline interface:
Arh, that number was 0, so the search is done with ~3million files + folders.
Thanks, that makes it easier to set up a measuring experiment for speeding up the query.
On the topic of slow and not optimized. I ran a version of that ‘find’ command shortly after posting and it’s still running right now. htop shows I/O and CPU usage. I’m not sure if it’s my older hardware but does it really need to take multiple days (possibly weeks) for this sort of ‘find’ command?
It should not, but there has been little work done to optimize this part of Duplicati. It is possible that it is buffering a hug response, if there are many files in the folder.
From the perspective of a user, waiting days for an answer is not useful.
I looked briefly at the code and it supports many complex things that just slow it down.
Once thing is using a Regex, which will revert to evaluating filters in C#, which will be a bit slow due to back-n-forth from the database.
Essentially, it is an SQLite database, so making a query for 3mil strings should take seconds, even on older hardware.
If you are familiar with SQL queries, you could also make a query into the database to locate the path prefix, and return any filenames that are in that folder. Let me know if you want to go that route, and I can assist in crafting the queries.
@kenkendk I am not kidding when I say that ‘find’ command is still on going. I’m wondering if there’s a way to restore all files under a folder with rename for files with differences or if it actuality would be any faster.
@kenkendk I’m going to give up on waiting. I can make a backup of the database to work on. What should I know to work on the database? I’m just trying to get a list of files from a folder in each version to check any deleted files.
In the “Browse Data” tab, you can find the “File” table, and there see all paths. You can filter in the top, under the column name and drill down to the files you need.
You can also use the “Execute SQL” area to write the queries directly.
To find the names of files in a folder, use this query (replace /Users/ with your prefix):
SELECT "Path" FROM "File" WHERE "BlocksetID" > 0 AND "Path" LIKE '/Users/%';
To get the timestamps of the backups where these files are in, use a query like:
SELECT "Timestamp" FROM "Fileset" WHERE "FilesetID" IN (
SELECT DISTINCT "FilesetID" FROM "FilesetEntry" WHERE "FileID" IN (
SELECT "ID" FROM "File" WHERE "BlocksetID" > 0 AND "Path" LIKE '/Users/%'));
The timestamps you get back are in Unix Epoch format, and can be converted to “normal” time with an online tool.