I’m wondering what relative performance I should expect to see? If the number of versions increases to the hundreds or thousands would there be additional performance degradation and what would the expected slowdown be (e.g. linear with number of versions)?
I’m planning on backing up my local personal files every hour in order to capture any “work in progress” files (in case I do something stupid or something bad happens) with a longish retention (3 months - 1 year). Initial size is 150GB and I would expect to see frequent small changes. Does this setup make sense and how will restore files behave with large number of versions?
I can’t read the code, but I wouldn’t expect any relationship between speed of folder browsing and number of versions. Every backup has its own DLIST file, containing all filenames, which I expect (not sure about this) to be reflected in the local database.
I guess the total number of files (maybe also the total number of blocks/file fragments) can decrease performance of the file browser.
Hmm it looks mostly like a caching issue per backupset. For me it might be possible to collect a full filetree ina background process and store the tree in a local cache. this information should not change untill a version is finished.
In case the cache is available the query could be much faster (basically read the json dump and select on path).
If the cache is not or no longer available (quota on caches) it would fall back to DB Queries.
I have not yet had to restore anything, but the idea of waiting half a minute to even open a folder is somewhat horrifying. When you lost data, that’s certainly not the kind of UX you want. So I’m wondering: what are the plans regarding this issue?
Two solutions have been mentioned:
One is to change the layout of the database which
and the other suggests that
which makes sense to me and may not be as laborious to implement?
Yes, it would be possible to simply dump all filenames to the client, but since there is a big delay already, I think dumping all files is likely to just cause out-of-memory.
I now have 284 versions (122 GB). When trying to restore it took over 4 minutes of the “Getting file versions …” message before populating the versions drop down list. This is much slower than before. The files list still hasn’t rendered after 20 minutes (Core i7 16GB w/ SSD so not hardware related). So, I don’t think this is very usable.
I know it’s a big body of work with some potential risk but any plans to implement a solution?
Is there an issue opened in github for this one already? I couldn’t find one, but this seems like something that should be logged there.
This is one I’d put a bounty on
While @kenkendk didn’t note the level of detail in that issue that he did in this thread, the core issue is indexing. I can watch disk usage while the list extracts from the live database to a temporary one for each level that is being walked, and while technically I would want to see this all pre-indexed, it may be that poses its own issues when dealing with recreating local databases, or even the amount of time, cpu and disk resources that are required to do so at the end of a backup–and what happens if the system is shut down while the index is being created.
I see a whole lot of work to make this work fast, and I’m only tertiarily knowledgable about fast browsing of databases.
How the heck did my GitHub search for “restore” not find that issue???
You are correct - it is likely a database design issue. As discussed elsewhere in this forum, it is caused by storing paths as full strings rather than splitting them up into tress - and, yes, it’s a BIG rewrite to fix it the right way because that table is touched by so many places in the code
That being said, I have a completely un-verified THEORY that some clever indexing and/or views might allow for incremental updates of the code base rather than needing a major re-write to get this to happen.
Unfortunately, development time is a bit scarce lately and most of it has been spent working on bugs or performance issues in more frequently used code. But at least you can know that the issue is recognized as needing to be fixed…
I ended up opening a new issue. The previous issue seems to have devolved from the main point. @kenkendk noted that this was related to #1672, and that was noted as fixed.
So either there is a regression, or this is really a different issue that has similar effect.
While the frequency of code use may be small, the importance of the restore working easily is not small.
If I can get some time, I’ll see if I can ask a friend who does db design how they get good responsiveness for database queries to list files.
Thanks for your effort guys. An “as easy as possible to use” and reliable restore process is ESSENTIAL for the trust in a backup software - better folder browsing on restore would help here.
My basic thought is always: when I die, will relatives be able to restore files? My answer currently is: Maybe, but also maybe they give up somewhere in the process. I.e. the computer illiterate point of view.
@Tapio I have to concur. I’m actually a bit surprised to see that the restore browsing is as slow as it is, based on the rather high quality of the rest of Duplicati.
My guy says that this was the original code put forth, and hasn’t really been touched since.
I really really wish I was a programmer, and I’d go update the code myself.
I agree with the statements that restoring (and particularly the browsing of data) should be smooth. The slowdown is not easy to fix, as it is caused by a large table of paths needing to be filtered.
I am not aware of an efficient way to write this in SQLite, but anyone who does can look at the query here:
My long-term fix is to rewrite the way the paths are stored in the database, which will make this query really fast.
I don’t see that in what you write. You rather seem to agree…?! I think we agree in that Duplicati should be as easy to use as possible, failsafe, fast, transparent and highly configurable.
E.g. Arq, has a good “fire and forget” mentality, well thought out at many points, but is too bare bones, too few possibilities which seems to be his his philosophy… (well, Apple origins, need I say more…).