VACUUM - performance improvement is huge


#1

Hello, thanks to https://www.duplicati-monitoring.com I noticed that backup which was previously completed in 2-3 hours now after six months and 180 versions takes 6-7 hours

Numbers of files/size is same as 6 months ago:

Backup times 6 months ago:
image

Backup times now (after 180 versions):
image

Sqlite database size was 17GB, so I thought: what if I vacuum DB? I’m using 2.0.3.5, so vacuum is not ran automatically

Vacuum took 1,5 hours and during proces it took 47 GB on HDD (1x original db, 1x new file in temp, 1x journal for original file.
After vacuum DB size is 15,3GB

So I started backup to see if there is any improvement …
Backups after vacuum are in red rectangle
image

So speed improvement is noticeable.

Maybe Duplicati needs inform user about running vacuum after a certain number of versions? It’s a pity that after 2.0.3.3, backups will gradually slow down for all users…


Duplicati-monitoring.com - central monitoring of multiple Duplicati instances + nice email reports
#2

Great write-up, thanks for figuring that out!

I don’t know anything about the vacuum process, but maybe there’s a way FAST for Duplicati to check the database for potential improvements and run it when at least X improvement can be expected.

Alternatively, I’d love to be able to schedule (in Duplicati) running if other tasks like vacuum or test restores.

Until then I’m afraid we’ll just have to keep using an external scheduler to fire off command line runs.


#3

@JonMikelV nice idea :slight_smile: but

kenkendk commented on 24 Aug

Interesting! It does not seem that there is a quick and easy way to check if VACUUM has any impact:
SQLite - programmatic way of determining fragmentation?
So we would be stuck with some derived logic, like “after n blocks are deleted” or similar.

crazy4chrissi commented on 24 Aug

…Also it is not as if sqlite3_analyzer would be fast on a 19 GiB database. If it takes you 1 hour to analyze if your DB needs being VACUUMed, it may not be worth it, you may have better used the time to actually do the VACUUM…


#4

I definitely like the sound of a scheduled cleanup. All kinds of other services does that.

My Plex server runs multiple different maintenance tasks every night, so I don’t see any reason Duplicati can’t as long as it’s configurable and transparant :slight_smile:


#5

–auto-vacuum lets you configure it somewhat, just on or off, and it might wind up running more than you like. Below links also mention that VACUUM is not instant for large databases (and this post adds another timing). Just pointing this stuff out because it might be enough in certain cases, and might be extensible for others…

Don’t use VACUUM automatically #2578

Progress on #2578 #2626


#6

I’d love for there to be an --auto-vacuum like option where it runs once a week or something.


#7

So turn it from a bool to an int indicating days between runs (requiring a log somewhere of the last run)?

I still feel we should be able to figure out what causes the need & monitor that. Is it database size, numbers of rows marked as deleted, etc? There must be some sqlite command to get unreleased table space…


#8

On the other hand, auto-vacuum can be pretty dangerous for large databases. For vacuum in Duplicati, user needs 2x database size free space on the disk.
So if your DB have 17GB before, you need another 30-34 free space on the disk GB to vacuum it.
I described it in more detail in https://github.com/duplicati/duplicati/issues/3194#issuecomment-436178411

PS: Another successful vacuum, not in backup job with 1.5 million files:

Before:
image

After vacuum from 13GB to 12GB

image


#9

I wonder if Duplicati checks free space before starting a vacuum…


#10

@JonMikelV It does! At least duplicati.commandline.exe.

I was not able start vacuum in commandline if I have low free space on drive C:\ - even if the database was on disk D:\ and temp was on disk Y:
But on web gui it work, so I probably did something wrong

Another thing is, whether low free space check require 1x or 2x free DB size space.


#11

Is there a way to run vacuum on commandline for all my databases or do I have to run it on each individually?


#12

Only individuality, but it’s pretty easy process via web gui, click on Commandline …, select Vacuum from dropdown menu, delete text in “Commandline arguments” and click Run


#13

You could also create a script or batch file to run Duplicati.CommandLine.exe for each backup, if that helps.

C:\Program Files\Duplicati 2>Duplicati.CommandLine.exe help vacuum

Usage: vacuum <storage-URL> [<options>]

  Rebuilds the local database, repacking it into a minimal amount of disk
  space.

Exporting the job as a command line and then editing it (similar to @mr-flibble GUI approach) should work.


#14

What command did you use to do the database vacuum? I was very interested about that!

I have a client with very large banks where this option can be the lifeline for a problem I am facing … After 80 or 90 versions crashes occur in the execution of the backup and I had to delete the job and start from scratch, even though it is an Xeon server, it is a very old machine, which works perfectly for the job, but during the backup it gets 100% processing.


#15

I tried to test, I went through the web interface, I selected Job Test1, I selected the option “Command line” and I had the vacuum run, when running it it returns the following error:

Found 2 commands but expected 1, commands:
“mega://Teste?auth-username=mail@gmail.com” “E:\cssg”
Return code: 200

Version: 2.0.4.4_canary_2018-11-14


#16

Hello, you doing it right. In web interface select Commandline, then choose Vacuum.
But after that, remove text from “Commandline arguments” filed
It’s should be empty
image


#17

I’d usually link to the command’s entry in Using Duplicati from the Command Line, but there isn’t an entry, so:

C:\Program Files\Duplicati 2>Duplicati.CommandLine.exe help vacuum

Usage: vacuum <storage-URL> [<options>]

  Rebuilds the local database, repacking it into a minimal amount of disk
  space.



C:\Program Files\Duplicati 2>

Above is what @mr-flibble was aiming you at. Keeping options is generally fine, and sometimes necessary.


#18

I ran the command over the web interface to see if I could copy the complete command to run via Task Scheduler or via Crontab.

To test the way you said, I created a backup job that backs up 20 exe files from a total of 200kb to mega.nz and the task is running 10 minutes without completing.

In the meantime I tried to stop the Job and I could not, it was as if Duplicati was locked during the execution of the service and not even close and open again it unlocks.


#19

Friends searching Google on Duplicati’s vaccum, I found this from here: SQLite VACUUM

After reading the article, I decided to test on a client that has a database of 163MB, I ran the following command:

sqlite3 72667174897076837983.sqlite VACUUM;

After running this command, the database was cleaned up to a size of 145MB.

With the vacuum done, I did the test trying to run the Backup Job again and it ran 100% without fail. \o/

In this way, I believe that at least for me, it is more feasible to run the vaccum outside of Duplicati via script (powershell or shell script), than with Duplicati.CommandLine.exe


#20

Hello
Tacio, excellent.
In the Windows operating system I could not find ways to execute this command.
It looks like Sqlite is not installed.
Anderson