Some observation about database maintenance

ZebCorp · May 23, 2020, 10:30am

Database maintenance describes process of repair as - If the backup and the remote storage is out of sync, Duplicati will require that you perform a repair operation to synchronize the database. - but rather it means, repair synchronize database not to the local state but to the last know database state.

Lets consider a small test backup, consist of files test1, test2, test3 and Duplicati local database1. Run our test again to add a few add test4, test5, and test6 plus newly created local database2. Lets assume system crash and restore from image. We have now database1 and set of local backup test files 1-6. Now, if we run “repair and delete local database” we’ll be all good and fine (Duplicati does synchronize properly), but if we choose repair only, we’ll land in the situation where Duplicati restore local backup to the level of database1, we’ll have files test1-3 - but - test files 4-6 we’ll be deleted from local backup. And that behavior is, I think, something to consider in the crash and restore situations.

Ps. Repair only database show this sets of errors:

2020-05-23 12:18:37 +02 - [Error-Duplicati.Library.Main.Operation.RepairHandler-FailedNewIndexFile]: Failed to accept new index file: duplicati-ic9accf3a0e0f4e408e4e07705f02dcce.dindex.zip, message: Volume duplicati-b64b930df98344e08b3423003b5c78878.dblock.zip has local state Deleting

Duplicati ver. 2.0.5.1 Windows x64, Home

ts678 · May 23, 2020, 6:48pm

I’m not sure new version is better than the original. It implies that there’s state saved, but there’s not. Original is vague, but that (maybe sadly) also gives it leeway to synchronize by changing either side.

Repair command deletes remote files for new backups #3416 and link references to it are your case. Notes on this open issue agree it’s an issue, but apparently hard to handle in Repair, which probably assumes that the DB and remote are close. If not, (as you saw), Recreate starts from no database…

ZebCorp · May 23, 2020, 10:46pm

Well, in this case of system hypothetical failure scenario, we have two “states”, one is local backup, other is local database. Database is no needed to restore a local backup (which is one of Duplicati virtues), is a “just” maintenance feature speeding things up and doing other “useful database stuff” (if I understand correctly). If so, why database (state of database) has a “higher” priority than state of local backup (in case of inconsistency between the two)? I think, state of local backup should be considered first - as a probably one, which is more actual than local database - or at least, should be not modified without consent of user.

I just imagined situation, when my system disc is destroyed. I’ll probably restore system from image (which is typically a month old) and then I’ll use a Duplicati to restore rest of data. I’m using Duplicati scheduled backups, so chances to run one of scheduled backup with a month old database after restore a system image, are not so small. It might just happened, and choosing repair database (instead of repair and delete) is a recepire for loss of month of backuped data. In that context, I’m find describe of synchronize database a bit of misleading (or vague, if you will), because, as you mention, it is not said what exactly will be synchronized.

ts678 · May 23, 2020, 11:11pm

didn’t make me think of the backup as a database state, but fine, if that’s what you were referring to.

Possibly a typo there. Only one database.

I don’t know the design, but I think sometimes it changes the database to fit the backup, and in some (maybe undesired) cases it changes the backup. See comments later about potential dissimilarities…

Agree somewhat, but it’s not the current design. I’d say “file an issue on the bug”, but it’s already filed.

Meanwhile, a workaround is to not restore the stale database and then Repair. Do a Recreate instead.

Compacting files at the backend is one reason why it’s not a simple matter of Repair noticing the later backend files and recording their information in the database. The COMPACT command will make the database that you restored be full of references to files that no longer exist, because compact packed their contents in new files. It’s sort of like picking a random database and a random backup, and telling Duplicati to synchronize things that don’t come close to matching. Probably what should happen is the situation should be recognized as hopeless, and tell the user to Recreate, but such code isn’t there yet.

ZebCorp · May 24, 2020, 8:43am

@ts678 - thanks for details, much appreciated.

ts678 · May 24, 2020, 11:55am

How the backup process works describes some concepts.

Here’s more, but it’s a bit deep. This is useful to understand some of the challenge of bringing a stale database up to current. The compact operation changes some dblock files whose wasted space gets reclaimed by compact. This makes the associated dindex files (one per dblock) change. Old files that fed into the compact are deleted. I don’t recall how Duplicati reacts to missing files. May not be pretty, however in the specific case of missing dindex files on the remote, these are recreated from DB data. Missing dlist files are also recreated, so there’s a case where changes from local-to-remote are good.

Another challenging case is if someone runs The PURGE command or maybe The PURGE-BROKEN-FILES command which change the dlist files (one per backup version, describing files in that version). These don’t have to change for a compact, because they reference file blocks by hash, which remain constant even as the blocks themselves move around from dblock to dblock, as compacting happens.

The DELETE command and any delete done by configured version retention can also surprise old DB because the versions it knows of may be missing at later time. A delete means the corresponding dlist file goes away, and possibly compact runs to reclaim space from data that are now in no version at all.

The REPAIR command has always had a difficult time because there are so many ways things can fail, even when one isn’t trying to update stale databases from image backups into something that’s current. There ARE some additional commands (e.g. list-broken-files and purge-broken files) that can also help.

Disaster Recovery covers the general topic, and maybe it would be good to at least temporarily say not to try the restore-DB-from-image then run Repair plan. The manual as a rule doesn’t cover specific bug avoidance, but it does have a few guide-people-through-something sections, and this is one of them…

Meanwhile, maybe someone can find a way to keep a confused Repair run from deleting new backups.

ZebCorp · May 24, 2020, 9:31pm

Hello, thanks for some further insights. My “hypothetical case of disaster” is probably a bit specific (if not really realistic), because neither remote backup nor local database are corrupted or lost, they’re “just” mismatched by a few days. I know my remote backup is fine and dandy (well, I assume…), but Duplicati cannot make that assumption, of course. I read somewhere, one of design choices was “never trust destination”, so in case of inconsistency, remote is checked against database, and database “wins” (so to speak).

Showing consequences of program action before real action has place, if you choose Repair - you might lose files from date x to date y, please consider Recreate option instead - would be a sensible idea. Alas, I understand, this is the work in progress.