"database is full" with TMPDIR pointing at a 5TB partition: 3̶2̶-̶b̶i̶t̶ ̶b̶u̶g̶?

conrad · January 31, 2020, 5:15pm

I’ve just received a “database is full”/“disk is full” error after over a year of no complaints; ran into these problems before though and so have been running with TMPDIR set to a partition with vast amounts of free space.

It occurs to me that this is a RAID array which just expanded from 4TB to 8TB before this latest backup run, so the free space has gone from fits-in-32-bits to doesn’t-fit-in-32-bits. Could this be a bug?

[running 2.0.3.3_beta_2018-04-02 because ran into trouble with the database format upgrade last time I tried it; will maybe give 2.0.05.1 a try soon]

drwtsn32 · January 31, 2020, 6:00pm

I don’t think the volume size is the issue. My NAS has a 16TB volume and Duplicati works fine. I’m using the latest beta though.

ts678 · January 31, 2020, 6:49pm

TMPDIR sounds like Linux, so make sure the mono --version is at least 5 if you decide to try 2.0.5.1.

“database or disk is full” is an SQLite error message, perhaps involving DB corruption. There are many references on the Internet, and suggestions to fix this, from people who know more about it than I do…

SQLite3 database or disk is full / the database disk image is malformed is one example of such dialog.

Some folks recommend PRAGMA integrity_check in the sqlite3 command line utility or sqlitebrowser. That’s probably not as effective as doing dump-and-reconstruct but might be a nice sanity check of DB.

When does this error occur? Do any operations run? Perhaps try something safe like a sample restore.

Rather than try to fiddle around with SQLite-level repair, I usually run Recreate if the DB has a problem. 2.0.3.3 might work fine. 2.0.4.5/23 added a bug that makes Recreate slow. 2.0.5.1 has a fix for the bug.

conrad · January 31, 2020, 10:00pm

Somewhat late I remember that 32 bits is exceeded at 4 GB of course, so yep, definitely shouldn’t be that. D’oh.

Trying out the sqlite checks now thanks, will see how that goes (even pragma quick_check is taking substantial time on my 24GB sqlite database). Last time I did a rebuild/recreate I think it took two weeks, so yeah really hope it’s not a corrupt DB sigh.

conrad · February 1, 2020, 5:53pm

Woaah. So so far, “pragma quick_check” (!) has taken seven hours, of which 9 minutes was actual CPU time. It’s become increasingly depressing how a single backup run with only minor changes to 1.4TB of data is taking 24-48h to complete; I wonder is there anything I can do to optimise database performance on this front? It’s on some pretty good 7200rpm drives, and the database is 24GB.

Wondering whether adding the index described here might help; will give a try once the integrity check completes.

Ingmyv · February 1, 2020, 7:26pm

It will help if that specific query is running slowly. Temporarily increasing the log verbosity to ‘trace’ can cast light on which queries are running slow (the log volume produced is extreme!) and you can check the xplan for them to confirm that the same logic is applicable.

SQLITE claims to be able to handle ludicrously enormous databases, up to 140 terabytes; (nearing the sheer size that people start to be seduced by glitzy “big data” marketing from Snowflake, Hadoop, etc) however, it is curiously nonspecific about the maximum size of a table; that is given in rows, not bytes, with the note:

The theoretical maximum number of rows in a table is 2⁶⁴ […] This limit is unreachable since the maximum database size of 140 terabytes will be reached first.

One thought. When was this database last vacuumed and analyzed? Vacuuming is a form of ‘garbage collection’ and on SQLITE it’s a major feature of the architecture.

(Unlike some other database that are full client/server architecture with a persistent ‘kernel’ there’s no inherent way for vacuuming and analysis it to be done automatically on SQLITE so it has to be periodically manually done. Duplicati has an option to vacuum after backup, but not analyze).

Vacuuming rebuilds the datafile without ‘blank space’ from deleted rows, and in sequential order. Both of these will speed things up, possibly quite a lot, and might shrink the database, also possibly quite a lot! Analyzing gathers stats about the database tables that allow the planner to make good choices about how to execute queries and may or may not speed things up too if SQLITE is making bad execution plans

I don’t know what the consequences of vacuuming a corrupt databse could be so probably better to wait for a “good sign” from the integrity check first. Also both vacuum (potentially very, very slow) and analyze, like all SQLITE writer operations causes a full lock so you won’t be able to back up or restore etc during this time.

https://www.sqlite.org/dbstat.html
Examples queries to check various storage attributes that give clues as to if vaccuming is likely to be useful.

conrad · February 1, 2020, 7:33pm

Thanks very much for all that @Ingmyv: much appreciated! I’ll give all of the above a try; have never either vacuumed or analysed, so will try those two first, assuming the checks succeed. I’ve also been recording backup performance stats for months, so am in a good position (if this works) to report back with performance and database size impacts.

ts678 · February 1, 2020, 8:19pm

Run PRAGMA optimize upon closing database connection #3745 reported a tremendous speedup, however you’d have to run 2.0.5.1, which also allows SQLite VACUUM from web UI Commandline.

VACUUM - performance improvement is huge

YMMV but if it helps you can make it automatic.

–auto-vacuum
–auto-vacuum-interval

It’s too late now, unless you want to restart the backup, but the –blocksize=100KB default may be smaller than optimum for large backups. It results in a whole lot of DB work tracking all the blocks.

Choosing sizes in Duplicati

Thinning versions can also help keep DB (and destination) sizes under control. –retention-policy.

I agree with @Ingmyv that getting this going at all should be done before trying to make it faster.

conrad · February 4, 2020, 6:02pm

Wow. So, I killed the integrity_check running on my NAS after 67 hours (and an average of 1.3% CPU usage), moved the database to a machine with an SSD where the integrity_check took 56 minutes and a vacuum took 25 minutes, then moved it back. I wasn’t super optimistic because the vacuum only reduced database size by 16%, and trying it on another Duplicati database during the 67-hour wait had pretty much no effect. But this backup just ran in 6 hours, compared to a bimodal performance previously of either 24h or 48h. So that’s an even bigger performance improvement than in the articles referenced above, and it’s now pretty much four times faster than it has ever previously run. Will see if this persists, but one learning I’d offer straight off is:

If you need to run big operations like integrity_check or vacuum, you can gain a hundredfold performance improvement by temporarily moving your database to an SSD for the job.
- I mean I know this is not surprising, but it’s useful to be reminded.

I think maybe an FAQ entry for large-ish backups could be helpful, and given this it might be useful if Duplicati flagged up the value of running vacuum every now and then; perhaps the current version already does?

Will continue to prod this, looking at the impact of analyze, upgrade, etc. over the next few weeks.

Ingmyv · February 5, 2020, 12:58am

The main bit from the speed up is probably the sorting effect from vacuum which reduces the random page selection cost. On ‘metal’ disks the random page selection cost can be really bad

ts678 · February 5, 2020, 4:15pm

Add auto vacuum interval option #3840 allows “every now and then” tuning. Here’s 2.0.5.1 Beta help:

C:\ProgramData\Duplicati\duplicati-2.0.5.1_beta_2020-01-18>Duplicati.CommandLine.exe help auto-vacuum-interval
  --auto-vacuum-interval (Timespan): Minimum time between auto vacuums
    The minimum amount of time that must elapse after the last vacuum before
    another will be automatically triggered at the end of a backup job.
    Automatic vacuum can be a long-running process and may not be desirable
    to run after every single backup.
    * default value: 0m

however it looks like you still need to turn it on yourself, and preferably pick an interval above 0m.
–auto-vacuum-interval = 0m is the default value? discusses the difficulty of picking the right value.

  --auto-vacuum (Boolean): Allow automatic rebuilding of local database to
    save space.
    Some operations that manipulate the local database leave unused entries
    behind. These entries are not deleted from a hard drive until a VACUUM
    operation is run. This operation saves disk space in the long run but
    needs to temporarily create a copy of all valid entries in the database.
    Setting this to true will allow Duplicati to perform VACUUM operations at
    its discretion.
    * default value: false

conrad · February 14, 2020, 7:54am

Some notes on performance before and after vacuum and analyze:

Recent historic performance: avg. 27h18m
After vacuum: avg. 6h18m (77% better)
After analyze: avg. 6h8m (2-3% better)

Will try logging next and see whether anything stands out. Basic message is “run vacuum”!