Release: 2.0.3.10 (canary) 2018-08-30

No, it is not intended to be a feature. I just included the environment variable to get performance numbers. You need to set the environment variable for the entire process, which is why it is easier to test from the commandline, as each run is a new process.

Is this from the WebUI, or from the commandline? It seems @Mikael_Andersson has significantly increased performance on the commandline with the new query, but horrible performance from the WebUI.

All: you should get a warning message if then environment variable kicks in, with something like:

Using performance test query version 1 as the TEST_QUERY_VERSION environment variable is set

If you do not get any warnings from running the job, the environment variable was not set correctly.

WebUI. I did only test version-1 first, but after test with version-2 I see that this version is faster.
Yes, A warning message kicks in when the environment variable is set.

Confirming what others have said - upgrading from 2.0.3.9 to 2.0.3.10 has caused my main PC’s backup job (800gb) to go from 5 or 6 minutes per run to over 70 minutes per run. (edit to add: this PC is Windows 8.1)

Edit: I just updated my work laptop (windows 10) from 2.0.3.9 to 2.0.3.10 as well, and ran my B2 backup (~6 GB). The previous automated run, which was today around noon, took about 2 minutes. The one I ran after upgrading took 1 minute, 9 seconds - almost twice as fast. (Meanwhile, the automated job on my home PC that ran overnight last night took 99 minutes, considerably more than the ~70 it was taking initially…)

Sooo… my results are initially confusing if nothing else. Could the slowness be OS dependent to some extent? Can any other users (@Mikael_Andersson, etc) confirm what OS they’re using when experiencing the slow version? It would be good to establish some sort of pattern - I have no idea whether it would be OS, but that’s the first and most obvious thing I can think of.

FYI, I did some playing around with queries, first, might I suggest something like this:

SELECT File.ID as FileID, FilesetEntry.Lastmodified, FileBlockset.Length, 
MetaBlockset.FullHash AS Metahash, MetaBlockset.Length as Metasize
FROM FilesetEntry
INNER JOIN Fileset ON (FileSet.ID = FilesetEntry.FilesetID)
INNER JOIN File ON (File.ID = FilesetEntry.FileID)
INNER JOIN Metadataset ON (Metadataset.ID = File.MetadataID)
INNER JOIN Blockset AS MetaBlockset ON (MetaBlockset.ID = Metadataset.BlocksetID)
LEFT JOIN Blockset AS FileBlockset ON (FileBlockset.ID = File.BlocksetID)
WHERE FilesetID = ? AND File.Path = ?
LIMIT 1

From an SQL perspective, this query is a little nicer. First, prefer JOIN to using WHERE, as this allows both variance in HOW the join occurs, and short-circuiting the joining process (ie. if a JOIN can’t happen, it won’t even attempt to look for data from the other tables).
Second, the more important joins (the ones more likely to eliminate cut down the number of resulting rows) are listed earlier.
Third, The quicker WHERE condition should go first (presumably a numerical lookup on a smaller set of data is faster than a text-based lookup, though both are indexed).
Finally, and this is the MOST important, ADD THE LIMIT 1 BACK!

In my own testing, that LIMIT 1 changes the fetch time from say 10-15ms to >350ms. Because even though your initial query is guaranteed to return exactly ONE result, the SQL server will KEEP LOOKING for any other matches, unless there is a LIMIT 1. Add back the LIMIT 1 will increase the SQL performance of your query DRASTICALLY.

I’m NOT an SQL expert, but I have used it enough to get some sense of optimizing it. But as I said, the most important thing for the query is the LIMIT 1, from my own experiments, that is easily the root cause of this query taking 40x longer with the ‘new’ version.

3 Likes

To cross reference, Slow backup in version v2.0.3.10 downgrade to version 2.0.3.5 can add some data points.

I suspected that LIMIT 1 could cause a speedup, but I did not expect it to be that significant.

I am preparing v2.0.3.11 with changes to the lookup logic, and have explained how the environment variable will work in that version:

2 Likes

Hi all,
Using v2.0.3.11, most backup sets seem to run ok but for one which became slow. The upgrade from 2.0.3.5 to 2.0.3.11 brought it from 8 min to almost 4hrs.
I tested TEST_QUERY_VERSION=1 and 2 for the time being, but what actually troubles me is the
ExaminedFiles: 60229
OpenedFiles: 25372
although the files did NOT change.
I’m using debian stretch and the problematic backup set is a 400Gb music library.
The number of files examined and opened is constant across tests (5 as or now).

Is the longer duration also constant across your 2.0.3.11 runs?

yes it is over 3 (TEST_QUERY_VERSION=1) and 5 (TEST_QUERY_VERSION=2) different tests.
I moved on to test (TEST_QUERY_VERSION=3) and everything is awfully slow now. More on repetition over time and details later.

Here is a summary of reproducible results for TEST_QUERY_VERSION on my server using 2.0.3.11.
3 and 4 give absolutely terrible results, backup sets need 1 to 4 hours instead of a few minutes for all backup sets.
1 gives about usual (2.0.3.5 and earlier) results +20% except for the music library.
2 gives best results and likely has a short edge over previous results except for the music library.
For all cases and backup sets expect the music library “opened files” is close to none, likely corresponding to modified files.
For the music library which did not change over the tests, opened files is 25372 out of 60229 examined files. For value 1 and 2 this gives 3 to 4 hours instead of less than 10min on 2.0.3.5.

Could the 25372 opened files be the result of a change in 2.0.3.11 or of a damaged database? should I test something on the database?

Did you move any further with this testing?

The opened file count could be related to the 2.0.3.11 update but if so, would likely only show up on the first run after the update. Are you still seeing that high number?

Sorry my previous message Release: 2.0.3.10 (canary) 2018-08-30 was a reply to myself.
Opened files 25372 out of 60229 examined files remains constant despite several backups.
I moved on to use default (2) for TEST_QUERY_VERSION and decided to create a new backup set from scratch for my music library, I always wanted increase the chunk size. For the time being i can also keep the old (faulty?) set for test purposes, however this new initial backup is taking about 2 weeks, there are still about 3 days to go. I’ll post new results ASAP.

Hi,

My new backup set is working fine (4 minutes, none or very few opened files) but the old backup is still broken with the very same figures (Opened files 25372 out of 60229 examined files) about 4 hours. So I assume something in the database is broken and is not corrected/completed by running the backup. I’ll keep the “broken” backup as long as I can if you need me to run special tests.

Is that still with 2.0.3.10 or have you updated to 2.0.3.11?

Not that I expect it to resolve the old backup issue, just curious.

No I’m using 2.0.3.11 but the issue was first open with 2.0.3.10

I installed Duplicati in a Debian text mode only, how can I get access to the web interface? Is there any configuration file you can edit or something?

If you can run a web browser on the local box try 127.0.0.1:8200 or localhost:8200.

If you’re trying to get to it from another system you’ll need to start Duplicati with the parameter to allow external connections (which I can’t recall the name of at the moment, sorry).

I ended up getting access via port redirection via ssh:

ssh user@server-ip -f -N -L8200:127.0.0.1:8200

So you can access the local interface through url: localhost: 8200

1 Like

For the record it looks like the newly supported return codes are:

  • 0 (run the job)
  • 1 (abort job silently)
  • 3 (abort job with warning)
  • 5 (abort job with error)