Release: 2.0.3.10 (canary) 2018-08-30


#21

The variable must get into the server process. I’d expect a batch file to use a different process. You can try…
Sysinternals Process Explorer can show you what environment variables are in a selected process right now.

My suspicion is that the general environment variables are the right place, and then restart Duplicati service.
Control Panel --> System and Security --> System --> Advanced system settings --> Environment variables


#22

No, it is not intended to be a feature. I just included the environment variable to get performance numbers. You need to set the environment variable for the entire process, which is why it is easier to test from the commandline, as each run is a new process.

Is this from the WebUI, or from the commandline? It seems @Mikael_Andersson has significantly increased performance on the commandline with the new query, but horrible performance from the WebUI.

All: you should get a warning message if then environment variable kicks in, with something like:

Using performance test query version 1 as the TEST_QUERY_VERSION environment variable is set

If you do not get any warnings from running the job, the environment variable was not set correctly.


#23

WebUI. I did only test version-1 first, but after test with version-2 I see that this version is faster.
Yes, A warning message kicks in when the environment variable is set.


#24

Confirming what others have said - upgrading from 2.0.3.9 to 2.0.3.10 has caused my main PC’s backup job (800gb) to go from 5 or 6 minutes per run to over 70 minutes per run. (edit to add: this PC is Windows 8.1)

Edit: I just updated my work laptop (windows 10) from 2.0.3.9 to 2.0.3.10 as well, and ran my B2 backup (~6 GB). The previous automated run, which was today around noon, took about 2 minutes. The one I ran after upgrading took 1 minute, 9 seconds - almost twice as fast. (Meanwhile, the automated job on my home PC that ran overnight last night took 99 minutes, considerably more than the ~70 it was taking initially…)

Sooo… my results are initially confusing if nothing else. Could the slowness be OS dependent to some extent? Can any other users (@Mikael_Andersson, etc) confirm what OS they’re using when experiencing the slow version? It would be good to establish some sort of pattern - I have no idea whether it would be OS, but that’s the first and most obvious thing I can think of.


#25

FYI, I did some playing around with queries, first, might I suggest something like this:

SELECT File.ID as FileID, FilesetEntry.Lastmodified, FileBlockset.Length, 
MetaBlockset.FullHash AS Metahash, MetaBlockset.Length as Metasize
FROM FilesetEntry
INNER JOIN Fileset ON (FileSet.ID = FilesetEntry.FilesetID)
INNER JOIN File ON (File.ID = FilesetEntry.FileID)
INNER JOIN Metadataset ON (Metadataset.ID = File.MetadataID)
INNER JOIN Blockset AS MetaBlockset ON (MetaBlockset.ID = Metadataset.BlocksetID)
LEFT JOIN Blockset AS FileBlockset ON (FileBlockset.ID = File.BlocksetID)
WHERE FilesetID = ? AND File.Path = ?
LIMIT 1

From an SQL perspective, this query is a little nicer. First, prefer JOIN to using WHERE, as this allows both variance in HOW the join occurs, and short-circuiting the joining process (ie. if a JOIN can’t happen, it won’t even attempt to look for data from the other tables).
Second, the more important joins (the ones more likely to eliminate cut down the number of resulting rows) are listed earlier.
Third, The quicker WHERE condition should go first (presumably a numerical lookup on a smaller set of data is faster than a text-based lookup, though both are indexed).
Finally, and this is the MOST important, ADD THE LIMIT 1 BACK!

In my own testing, that LIMIT 1 changes the fetch time from say 10-15ms to >350ms. Because even though your initial query is guaranteed to return exactly ONE result, the SQL server will KEEP LOOKING for any other matches, unless there is a LIMIT 1. Add back the LIMIT 1 will increase the SQL performance of your query DRASTICALLY.

I’m NOT an SQL expert, but I have used it enough to get some sense of optimizing it. But as I said, the most important thing for the query is the LIMIT 1, from my own experiments, that is easily the root cause of this query taking 40x longer with the ‘new’ version.


#26

To cross reference, Slow backup in version v2.0.3.10 downgrade to version 2.0.3.5 can add some data points.


#27

I suspected that LIMIT 1 could cause a speedup, but I did not expect it to be that significant.


#28

I am preparing v2.0.3.11 with changes to the lookup logic, and have explained how the environment variable will work in that version:


#29

Hi all,
Using v2.0.3.11, most backup sets seem to run ok but for one which became slow. The upgrade from 2.0.3.5 to 2.0.3.11 brought it from 8 min to almost 4hrs.
I tested TEST_QUERY_VERSION=1 and 2 for the time being, but what actually troubles me is the
ExaminedFiles: 60229
OpenedFiles: 25372
although the files did NOT change.
I’m using debian stretch and the problematic backup set is a 400Gb music library.
The number of files examined and opened is constant across tests (5 as or now).


#30

Is the longer duration also constant across your 2.0.3.11 runs?


#31

yes it is over 3 (TEST_QUERY_VERSION=1) and 5 (TEST_QUERY_VERSION=2) different tests.
I moved on to test (TEST_QUERY_VERSION=3) and everything is awfully slow now. More on repetition over time and details later.


#32

Here is a summary of reproducible results for TEST_QUERY_VERSION on my server using 2.0.3.11.
3 and 4 give absolutely terrible results, backup sets need 1 to 4 hours instead of a few minutes for all backup sets.
1 gives about usual (2.0.3.5 and earlier) results +20% except for the music library.
2 gives best results and likely has a short edge over previous results except for the music library.
For all cases and backup sets expect the music library “opened files” is close to none, likely corresponding to modified files.
For the music library which did not change over the tests, opened files is 25372 out of 60229 examined files. For value 1 and 2 this gives 3 to 4 hours instead of less than 10min on 2.0.3.5.

Could the 25372 opened files be the result of a change in 2.0.3.11 or of a damaged database? should I test something on the database?