Performance tips / what is normal?

ajkessel · March 1, 2019, 10:42pm

I see a number of threads here about performance issues but not sure how to figure out if my performance is as expected or if there is something wrong. Many operations start off with

SELECT COUNT(*) FROM "Block" WHERE "Size" > 102400

On my system, that one query takes three or more hours. The local sqlite file is 8.6GB, plus a 23M journal (at least at the moment). Most anything I do with the database takes several hours.

This is all on a physical drive (I have no SSDs on this box). Box itself is a Intel Core 2 Quad CPU Q9300 @ 2.50GHz.

Is this all normal? I expect if the database were MySQL/PostgreSQL/etc it would be much faster for queries like this; is there anything in the roadmap that would allow users to replace sqlite with a more robust database engine?

hydrian · March 2, 2019, 2:35pm

Your biggest bottleneck is disk I/O. What is your disk setup?

ajkessel · March 2, 2019, 4:58pm

This is what smartctl reports for the disk where the database is stored:

Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z5002VR6
LU WWN Device Id: 5 000c50 066871a84
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

hdparm:

 Timing cached reads:   3194 MB in  2.00 seconds = 1597.17 MB/sec
 Timing buffered disk reads: 222 MB in  3.12 seconds =  71.07 MB/sec

The external drive where the backup data itself is stored:

 Timing cached reads:   2878 MB in  2.00 seconds = 1439.13 MB/sec
 Timing buffered disk reads: 116 MB in  3.05 seconds =  38.01 MB/sec

There are several other internal and external drives on this box, but they all have similar performance stats. I realize this is all pretty old hardware; does this just mean I can’t get very practical performance out of Duplicati?

warwickmm · March 2, 2019, 5:17pm

8.6 GB is a pretty large local database. How large is the set of source files, and how many backups do you have? Many people experience significant performance improvements after running a vacuum command (create a backup of the database first just in case).

hydrian · March 2, 2019, 6:04pm

Is it taking forever to start returning a query or just outputting a query?

ajkessel · March 2, 2019, 6:45pm

Is it taking forever to start returning a query or just outputting a query?

Just based on the debug output, the “count(*)” query begins shortly after the CLI operation is invoked, but then takes about three hours to complete.

ajkessel · March 2, 2019, 6:46pm

8.6 GB is a pretty large local database. How large is the set of source files, and how many backups do you have? Many people experience significant performance improvements after running a vacuum command (create a backup of the database first just in case).

It’s a few terabytes of source data, mostly large video files that don’t change. Only half a dozen backup versions or so. I’ll try a vacuum when my latest delete operation completes.

hydrian · March 3, 2019, 2:13am

Your slowdown may actually the just the display output being the limiting factor. Try redirecting the output of the query to a file and see if that is significantly faster.

ajkessel · March 3, 2019, 4:16am

vacuum fails with Error: database or disk is full. The database is 8.6GB and the drive in question has 22GB free, so it would appear that it’s not really a full disk problem. What does it mean for the “database” to be full? Could this be connected to the performance issues?

ajkessel · March 3, 2019, 4:17am

To clarify, I’m not doing any queries myself, I’m just looking at the debug output of Duplicati. None of the query output is going to the screen; it’s just that the each step of a Duplicati process such as a delete operation takes hours. Nothing is on the screen other than SQL statement itself and the time it took to execute.

warwickmm · March 3, 2019, 4:32am

I think this means that the temporary directory is full.

Ingmyv · September 2, 2019, 4:41pm

I know this is an OLD thread but there is at least a colourable argument that this is actually a database design flaw/bug and I am going to raise a bug report for it: frequent, inefficient query on "Block" table is very slow on large databases. · Issue #3884 · duplicati/duplicati · GitHub

I have a suggested workaround too, see below.

The query originates from LocalDatabase.cs

    public long GetBlocksLargerThan(long fhblocksize)
    {
        using(var cmd = m_connection.CreateCommand())
            return cmd.ExecuteScalarInt64(@"SELECT COUNT(*) FROM ""Block"" WHERE ""Size"" > ?", -1, fhblocksize);
    }

This function in turn is called by various things, such as just running a backup, and compaction of the database. Unfortunately, I’m not a dev so I’m having a little trouble following the source code, but it also seems like it usually returns zero and is just a sanity check.

While it’s true that the disk IO is the bottleneck, an important latent question is how is SQLite’s implementing this query in such a manner as to cause the disk IO issues?

The slowdown is caused by the fact that the “Size” column is non-indexed and the planner is choosing an index with poor selectivity:

SELECT COUNT(1) 
  FROM "Block" 
 WHERE "Size" > 102400;

/* Result: 1 rows returned in 1528ms
At line 1:
SELECT COUNT(1) FROM "Block" WHERE "Size" > 102400; */

EXPLAIN QUERY PLAN 
SELECT COUNT(1) 
  FROM "Block" 
 WHERE "Size" > 102400;

  id   parent   notused                         detail                         
 ---- -------- --------- ----------------------------------------------------- 
   3        0         0   SCAN TABLE Block USING COVERING INDEX BlockHashSize

This choice is actually correct, it has no better alternative. The planner has no resources available to it to ascertain the location of matching rows, so it scans the whole thing!

This table has (in my case: much smaller than yours) 1,727,853 rows. but it only has 50,891 distinct values! The wasted effort is massive because (in fact) I have zero matching rows. So the planner gathers a huge amount of data only to find it must throw it all away!

SELECT 'total' as DETAIL, COUNT(1) as ROWCOUNT
  FROM "Block"
  UNION ALL SELECT 'distinct', count(DISTINCT "Size")
              FROM "Block" ;

   DETAIL    ROWCOUNT  
 ---------- ---------- 
  total       1727853  
  distinct      50891

The job of BlockHashSize is to ensure that the combination of the Hash and Size columns on this table is unique. It’s not well equipped for this purpose. A professional database like Postgres or Oracle will also perform poorly on this.

For a proof: if we recreate identical DDL in Postgres, and import the data (This DB happens to already have an unrelated block table, so the below output shows duplicati_block``, instead.)

EXPLAIN (VERBOSE, COSTS, ANALYZE) 
SELECT count (1) -- Note: PG evaluates (*) differently from (1)
  FROM "duplicati_block" 
 WHERE "Size" > 102400;                           

QUERY PLAN                                                                                                                                        
--------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=29634.47..29634.48 rows=1 width=8) (actual time=577.102..577.103 rows=1 loops=1)                                        
  Output: count(1)                                                                                                                                
  ->  Gather  (cost=29634.26..29634.47 rows=2 width=8) (actual time=576.687..613.457 rows=3 loops=1)                                              
        Output: (PARTIAL count(1))                                                                                                                
        Workers Planned: 2                                                                                                                        
        Workers Launched: 2                                                                                                                       
        ->  Partial Aggregate  (cost=28634.26..28634.27 rows=1 width=8) (actual time=440.321..440.322 rows=1 loops=3)                             
              Output: PARTIAL count(1)                                                                                                            
              Worker 0: actual time=373.249..373.250 rows=1 loops=1                                                                               
              Worker 1: actual time=373.232..373.232 rows=1 loops=1                                                                               
              ->  Parallel Seq Scan on public.duplicati_block  (cost=0.00..28634.23 rows=10 width=0) (actual time=440.303..440.304 rows=0 loops=3)
                    Output: "ID", "Hash", "Size", "VolumeID"                                                                                      
                    Filter: (duplicati_block."Size" > 102400)                                                                                     
                    Rows Removed by Filter: 575951                                                                                                
                    Worker 0: actual time=373.235..373.235 rows=0 loops=1                                                                         
                    Worker 1: actual time=373.208..373.208 rows=0 loops=1                                                                         
Planning Time: 993.489 ms                                                                                                                         
Execution Time: 613.816 ms

As we can see, the PG planner–which is vastly more sophisticated–chose an essentially, structurally equivalent execution plan, the only real advantage it has is that Postgres is capable of parallelizing the scan (not all queries are parallelizable; SQLite does not have this feature at all, to my understanding.)

Possible one-off fix:

create index BlockCountSizeIdx on "Block"("Size");

SELECT COUNT(1) 
  FROM "Block" 
 WHERE "Size" > 102400;

/* Result: 1 rows returned in 9ms
At line 17:
SELECT COUNT(1) FROM "Block" WHERE "Size" > 102400; */

EXPLAIN QUERY PLAN 
SELECT COUNT(1) 
  FROM "Block" 
 WHERE "Size" > 102400;

  id   parent   notused                                 detail                                
 ---- -------- --------- -------------------------------------------------------------------- 
   3        0         0   SEARCH TABLE Block USING COVERING INDEX BlockCountSizeIdx (Size>?)

Notice in the ‘detail’ column how we’ve switched from “SCAN TABLE” which is SQLite-talk for the horrible, expensive solution of walking through the entire table (AKA “Sequential Scan” for Postgres) to the vastly more efficient “SEARCH TABLE” with an index predicate. I’m not as familiar with the output format of the SQLite “EXPLAINer” but I believe we’re looking at the equivalent of “Index Only Scan” here. If so, the planner realized it can answer this question solely by examining the index, so it wouldn’t’ve fetched any pages at all, so hardly any disk IO.

And FWIW, this is Postgres’ take on this optimization:

QUERY PLAN                                                                                                                                         
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=8.01..8.02 rows=1 width=8) (actual time=0.054..0.055 rows=1 loops=1)                                                              
  Output: count(1)                                                                                                                                 
  ->  Index Only Scan using blockcountsizeidx on public.duplicati_block  (cost=0.43..8.01 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)
        Output: "Size"                                                                                                                             
        Index Cond: (duplicati_block."Size" > 102400)                                                                                              
        Heap Fetches: 0                                                                                                                            
Planning Time: 1.229 ms                                                                                                                            
Execution Time: 0.282 ms

As before, it chose an very similar plan.

As was mentioned, there is a lot of postings here and there about the “SQLite performance problem” but to be very honest, I’m not positive there really is an SQLite performance problem in the sense of ‘hitting some theoretical limit’ of SQLite, despite the inefficient way writes work, or whether there is (as here) a DDL or query-design problem.

But SQLite is not instrumented for performance tracing the way a professional database is. It has no session history, no system catalogue views to inspect locks, blocking etc, no slow query log etc. This makes it difficult bordering on impossible to offer intelligent performance analyses.

On a 955 megabyte database, adding that index increased space by only 25 megabytes for a performance gain of a couple hundred times (1528ms vs 9ms) so this seems like a viable solution. Maybe the system could monitor the timings and create the index only when it exceeds running for, like 1 minute or something.

warwickmm · April 11, 2020, 7:42pm

A pull request that adds some indices to the Block table has been submitted: