Limit sqlite database backups - feedback please

I’d like to implement one or two options to limit sqlite database backups. I’m referring to the database backups that are created before a database schema change is applied (after some Duplicati version upgrades) … eg, backup XXXXXXXXXX 20191101000000.sqlite

Right now there is no mechanism to delete old backups so they can build up over time.

What do you think of the following option ideas:

--db-backup-keep-count
keep only X older backups, delete ones in excess of this number (oldest ones first)
(default = 2?)

--db-backup-keep-age
keep backup files only for a certain amount of time
(default = 14d?)

Does it make sense to have both of these options available? Or only one (and if so which one)? I’m kinda torn on this but think keep-age is probably better choice. Would be pretty easy to provide options for both though.

My initial feeling is that age is a less reliable indicator that a backup is no longer of use, however I don’t know the detailed algorithm. My guess is that a backup of a job DB is easy to trigger, e.g. even starting down the Restore path without actually doing restore might be enough. Any other DB access might too.

Because DB buildup is likely more of a problem for Canary users, I don’t want the fix to risk Beta users.

The whole DB backup scheme seems not very robust. As soon as you do another backup, a DB revert surprises Duplicati with extra backend files, and a Repair will reconcile the difference by deleting files…

Repair command deletes remote files for new backups #3416

On the other hand, the DB backups are probably better than nothing, but revert must be done carefully.

I wonder if there’s any hope of cleaning up old DB backups that were just dated without the name clue? Doing that based on age (even if it’s file age) might be possible but it would seem like quite a big guess.

I believe the automatic backups are a “just in case” the schema upgrade fails. Honestly those backups are probably useless almost immediately after the upgrade completes successfully and you run your next backup. I don’t think the intent of these backups is to protect you against database corruption during normal Duplicati usage.

You wouldn’t be able to reliably match those backups to the original database (in order to keep just X past versions), but we could potentially delete all “backup ##########.sqlite” files based on age.

I don’t think it’s ideal to leave Duplicati the way it is as those backup files build up, and on some machines they can be quite large.

If we were to implement both deletion options, I personally would set the count to 1 and age option to 1 week. That would be more than long enough for me. I don’t know if we should be that aggressive for defaults though - I was trying to be a bit conservative there.

Appreciate your thoughts.

It’s not that, but more on what happens if you get a new release and find some really bad problem and want to revert. If people ask, I usually tell them to decide fast because it gets harder with longer delay. Testing less critical backups first would also be good, both in cases of breakage and to ease reverting.

Yes, and this would make me nervous if done automatically. I’d classify that as being overly aggressive.

Yep good point, that too. While looking at the source last night I saw that Duplicati will automatically restore the backup if the schema update fails, which is why that was on my mind.

I’m not very comfortable with that idea either.

I do like the idea of having Duplicati delete older backups though, but it would be limited to backups that included the original database name plus the timestamp.

My $.02 is that effort on this would be better spent implementing Support for postgresql/mariadb · Issue #3013 · duplicati/duplicati · GitHub (“real” db backend). I’ve lost one or more of the sqlite databases due to service restarts/crashes far more often than I’ve had any issue with running out of storage.

When you get a corrupt database, what happens is the UI won’t start properly; you might see a database disk image is malformed error, and it appears all of your backup configurations are lost. The only way to figure out what’s wrong is to run a command like this for each of the mysteriously-named sqlite files under /config:

# sqlite3 LIYMTOCYDG.sqlite "PRAGMA integrity_check"
*** in database main ***
Multiple uses for byte 2834 of page 316635
Error: database disk image is malformed

The sole remedy is to identify any that are corrupted, blow them away, restart Duplicati and invoke db rebuild(s).

Solidify SQLite dependencies #4024 notes that Duplicati may use an SQLite that’s on the system, so it may be old. Mono has similar age issues. I wonder if old SQLite versions have more corruption issues?

How To Corrupt An SQLite Database File lists the possible causes, but hearing your experiences helps. There was a Duplicati contributor who wanted to torture-test Duplicati with kills. Sounds like it might be worthwhile, even from a bottom-level view of whether the DB can even be opened again in normal way.

If you come up with the steps to break it on demand, please file a GitHub issue so somebody can look. Switching databases sounds to me like a potentially big change. I also think DB developers are scarce. Volunteers are welcome, if any are out there. Fortunately, Duplicati SQL hasn’t needed fixing in awhile.

My motivation for creating the backups was that it could break things (delete stuff) and that prompted me to create the backups. But I never had a good idea for when to delete the old ones.

Looking back at what users have reported, I think there was only one instance where we needed a rollback, and that was “fixed” by supplying the SQL commands to revert, as the databases had been updated, so simply using the old copy was meaningless.

This makes me think that the backups are perhaps not really useful and can be deleted. Would a good marker be that one or two backups with the new database causes the backups to be deleted?

One or two successful backups - yes I like that idea.