VACUUM - performance improvement is huge


#1

Hello, thanks to https://www.duplicati-monitoring.com I noticed that backup which was previously completed in 2-3 hours now after six months and 180 versions takes 6-7 hours

Numbers of files/size is same as 6 months ago:

Backup times 6 months ago:
image

Backup times now (after 180 versions):
image

Sqlite database size was 17GB, so I thought: what if I vacuum DB? I’m using 2.0.3.5, so vacuum is not ran automatically

Vacuum took 1,5 hours and during proces it took 47 GB on HDD (1x original db, 1x new file in temp, 1x journal for original file.
After vacuum DB size is 15,3GB

So I started backup to see if there is any improvement …
Backups after vacuum are in red rectangle
image

So speed improvement is noticeable.

Maybe Duplicati needs inform user about running vacuum after a certain number of versions? It’s a pity that after 2.0.3.3, backups will gradually slow down for all users…


#2

Great write-up, thanks for figuring that out!

I don’t know anything about the vacuum process, but maybe there’s a way FAST for Duplicati to check the database for potential improvements and run it when at least X improvement can be expected.

Alternatively, I’d love to be able to schedule (in Duplicati) running if other tasks like vacuum or test restores.

Until then I’m afraid we’ll just have to keep using an external scheduler to fire off command line runs.


#3

@JonMikelV nice idea :slight_smile: but

kenkendk commented on 24 Aug

Interesting! It does not seem that there is a quick and easy way to check if VACUUM has any impact:
SQLite - programmatic way of determining fragmentation?
So we would be stuck with some derived logic, like “after n blocks are deleted” or similar.

crazy4chrissi commented on 24 Aug

…Also it is not as if sqlite3_analyzer would be fast on a 19 GiB database. If it takes you 1 hour to analyze if your DB needs being VACUUMed, it may not be worth it, you may have better used the time to actually do the VACUUM…


#4

I definitely like the sound of a scheduled cleanup. All kinds of other services does that.

My Plex server runs multiple different maintenance tasks every night, so I don’t see any reason Duplicati can’t as long as it’s configurable and transparant :slight_smile:


#5

–auto-vacuum lets you configure it somewhat, just on or off, and it might wind up running more than you like. Below links also mention that VACUUM is not instant for large databases (and this post adds another timing). Just pointing this stuff out because it might be enough in certain cases, and might be extensible for others…

Don’t use VACUUM automatically #2578

Progress on #2578 #2626


#6

I’d love for there to be an --auto-vacuum like option where it runs once a week or something.


#7

So turn it from a bool to an int indicating days between runs (requiring a log somewhere of the last run)?

I still feel we should be able to figure out what causes the need & monitor that. Is it database size, numbers of rows marked as deleted, etc? There must be some sqlite command to get unreleased table space…


#8

On the other hand, auto-vacuum can be pretty dangerous for large databases. For vacuum in Duplicati, user needs 2x database size free space on the disk.
So if your DB have 17GB before, you need another 30-34 free space on the disk GB to vacuum it.
I described it in more detail in Canary 2.0.3.6 and slow queries · Issue #3194 · duplicati/duplicati · GitHub

PS: Another successful vacuum, not in backup job with 1.5 million files:

Before:
image

After vacuum from 13GB to 12GB

image


#9

I wonder if Duplicati checks free space before starting a vacuum…


#10

@JonMikelV It does! At least duplicati.commandline.exe.

I was not able start vacuum in commandline if I have low free space on drive C:\ - even if the database was on disk D:\ and temp was on disk Y:
But on web gui it work, so I probably did something wrong

Another thing is, whether low free space check require 1x or 2x free DB size space.


#11

Is there a way to run vacuum on commandline for all my databases or do I have to run it on each individually?


#12

Only individuality, but it’s pretty easy process via web gui, click on Commandline …, select Vacuum from dropdown menu, delete text in “Commandline arguments” and click Run


#13

You could also create a script or batch file to run Duplicati.CommandLine.exe for each backup, if that helps.

C:\Program Files\Duplicati 2>Duplicati.CommandLine.exe help vacuum

Usage: vacuum <storage-URL> [<options>]

  Rebuilds the local database, repacking it into a minimal amount of disk
  space.

Exporting the job as a command line and then editing it (similar to @mr-flibble GUI approach) should work.