Very slow folder browsing during restore

ronald · September 4, 2017, 11:12am

When I want to recover some file. I noticed that opening a folder takes more than 30 seconds. This is for each subfolder including the root folder.

Are there options to speed this up?

30 seconds per folder click is way too long for interactieve work.

kenkendk · September 5, 2017, 11:59am

Not at this point …

I agree, it should be instant.
It requires a change in the database layout for this to be better.

ronald · September 10, 2017, 11:41am

Sorry, for late answer. Is this change in database layout something that is possible to implement? Or is it hard to implement?

tuzo · September 11, 2017, 7:10am

I’m wondering what relative performance I should expect to see? If the number of versions increases to the hundreds or thousands would there be additional performance degradation and what would the expected slowdown be (e.g. linear with number of versions)?

I’m planning on backing up my local personal files every hour in order to capture any “work in progress” files (in case I do something stupid or something bad happens) with a longish retention (3 months - 1 year). Initial size is 150GB and I would expect to see frequent small changes. Does this setup make sense and how will restore files behave with large number of versions?

kees-z · September 11, 2017, 7:51am

I can’t read the code, but I wouldn’t expect any relationship between speed of folder browsing and number of versions. Every backup has its own DLIST file, containing all filenames, which I expect (not sure about this) to be reflected in the local database.
I guess the total number of files (maybe also the total number of blocks/file fragments) can decrease performance of the file browser.

kenkendk · September 11, 2017, 10:18am

It is fairly involved, it requires changing the code in all the places that works with files and paths.

leika · September 13, 2017, 2:01pm

Hmm it looks mostly like a caching issue per backupset. For me it might be possible to collect a full filetree ina background process and store the tree in a local cache. this information should not change untill a version is finished.

In case the cache is available the query could be much faster (basically read the json dump and select on path).

If the cache is not or no longer available (quota on caches) it would fall back to DB Queries.

Stefan

tophee · September 27, 2017, 8:16am

I have not yet had to restore anything, but the idea of waiting half a minute to even open a folder is somewhat horrifying. When you lost data, that’s certainly not the kind of UX you want. So I’m wondering: what are the plans regarding this issue?

Two solutions have been mentioned:

One is to change the layout of the database which

and the other suggests that

which makes sense to me and may not be as laborious to implement?

kenkendk · September 28, 2017, 8:30am

Yes, it would be possible to simply dump all filenames to the client, but since there is a big delay already, I think dumping all files is likely to just cause out-of-memory.

tuzo · October 4, 2017, 3:55pm

I now have 284 versions (122 GB). When trying to restore it took over 4 minutes of the “Getting file versions …” message before populating the versions drop down list. This is much slower than before. The files list still hasn’t rendered after 20 minutes (Core i7 16GB w/ SSD so not hardware related). So, I don’t think this is very usable.

I know it’s a big body of work with some potential risk but any plans to implement a solution?

davegold · October 27, 2017, 1:53am

Is there an issue opened in github for this one already? I couldn’t find one, but this seems like something that should be logged there.
This is one I’d put a bounty on

JonMikelV · October 27, 2017, 2:23am

I’m not seeing one either so feel free to create one!

davegold · October 27, 2017, 2:43am

This seems similar to Listing directories for restore very slow · Issue #1715 · duplicati/duplicati · GitHub

While @kenkendk didn’t note the level of detail in that issue that he did in this thread, the core issue is indexing. I can watch disk usage while the list extracts from the live database to a temporary one for each level that is being walked, and while technically I would want to see this all pre-indexed, it may be that poses its own issues when dealing with recreating local databases, or even the amount of time, cpu and disk resources that are required to do so at the end of a backup–and what happens if the system is shut down while the index is being created.

I see a whole lot of work to make this work fast, and I’m only tertiarily knowledgable about fast browsing of databases.

Thoughts?

JonMikelV · October 27, 2017, 2:49am

How the heck did my GitHub search for “restore” not find that issue???
You are correct - it is likely a database design issue. As discussed elsewhere in this forum, it is caused by storing paths as full strings rather than splitting them up into tress - and, yes, it’s a BIG rewrite to fix it the right way because that table is touched by so many places in the code

That being said, I have a completely un-verified THEORY that some clever indexing and/or views might allow for incremental updates of the code base rather than needing a major re-write to get this to happen.

Unfortunately, development time is a bit scarce lately and most of it has been spent working on bugs or performance issues in more frequently used code. But at least you can know that the issue is recognized as needing to be fixed…

davegold · October 28, 2017, 12:03am

I ended up opening a new issue. The previous issue seems to have devolved from the main point. @kenkendk noted that this was related to #1672, and that was noted as fixed.
So either there is a regression, or this is really a different issue that has similar effect.

While the frequency of code use may be small, the importance of the restore working easily is not small.

If I can get some time, I’ll see if I can ask a friend who does db design how they get good responsiveness for database queries to list files.

Tapio · October 28, 2017, 12:39pm

Thanks for your effort guys. An “as easy as possible to use” and reliable restore process is ESSENTIAL for the trust in a backup software - better folder browsing on restore would help here.
My basic thought is always: when I die, will relatives be able to restore files? My answer currently is: Maybe, but also maybe they give up somewhere in the process. I.e. the computer illiterate point of view.

davegold · October 30, 2017, 12:34am

@Tapio I have to concur. I’m actually a bit surprised to see that the restore browsing is as slow as it is, based on the rather high quality of the rest of Duplicati.
My guy says that this was the original code put forth, and hasn’t really been touched since.

I really really wish I was a programmer, and I’d go update the code myself.

kenkendk · October 30, 2017, 12:31pm

I agree with the statements that restoring (and particularly the browsing of data) should be smooth. The slowdown is not easy to fix, as it is caused by a large table of paths needing to be filtered.

I am not aware of an efficient way to write this in SQLite, but anyone who does can look at the query here:

github.com

duplicati/duplicati/blob/master/Duplicati/Library/Main/Database/LocalListDatabase.cs#L217


            var dirsep = Duplicati.Library.Utility.Utility.GuessDirSeparator(s);



            s = s.Substring(ppl);

            var ix = s.IndexOf(dirsep, StringComparison.Ordinal);

            if (ix > 0 && ix != s.Length - 1)

                s = s.Substring(0, ix + 1);

            yield return prefix + s;

        }

}



public IEnumerable<IFileversion> SelectFolderContents(Library.Utility.IFilter filter)

{

    var tbname = "Filenames-" + Library.Utility.Utility.ByteArrayAsHexString(Guid.NewGuid().ToByteArray());

    try

    {

        string pathprefix;

        if (filter == null || filter.Empty)

            pathprefix = "";

        else if (filter as Library.Utility.FilterExpression == null || ((Library.Utility.FilterExpression)filter).Type != Duplicati.Library.Utility.FilterType.Simple || ((Library.Utility.FilterExpression)filter).GetSimpleList().Length != 1)

            throw new ArgumentException("Filter for list-folder-contents must be a path prefix with no wildcards", nameof(filter));

        else

My long-term fix is to rewrite the way the paths are stored in the database, which will make this query really fast.

Tapio · November 4, 2017, 4:21pm

I don’t see that in what you write. You rather seem to agree…?! I think we agree in that Duplicati should be as easy to use as possible, failsafe, fast, transparent and highly configurable.
E.g. Arq, has a good “fire and forget” mentality, well thought out at many points, but is too bare bones, too few possibilities which seems to be his his philosophy… (well, Apple origins, need I say more…).

tophee · November 5, 2017, 9:45am

Doesn’t to concur mean to agree?