I was wondering, Duplicati can get into an inconsistent state when the local database and the remote storage get out of sync.
Wouldn’t it not be more robust if Duplicati would backup the local database (encrypted of course), or backup the data in it so that the remote offers everything for a full restore) next to the remote indexes and archives? In terms of process, this step would be done after the backup is complete.
Technically the remote side already does have everything needed to do a restore. The local database can be rebuilt - if needed - by reading existing backup data on the remote side.
That being said, I do something similar to what you are suggesting. I have a second backup job set up in Duplicati whose sole job is to back up the database from the first job. Even though this isn’t technically necessary, I think it can possibly be faster than doing a database rebuild the “normal” way.
I don’t see how you can do this type of backup without setting up a second job though. With only one backup job the database itself is being modified/updated while that job is running. How can it back up the database while at the same time updating it? Even if it did, you wouldn’t be able to restore the database without first having a local database to begin with.
That’s why you either need to back it up using a second backup job, or use a method other than Duplicati to protect the database (which could be triggered automatically at the end of the backup job).
It’s an option to do this, although I think the primary reason it’s not built in is that it’s more a workaround than anything else.
Ideally repairing the database should be manageable enough that you wouldn’t care about whether or not you have a copy of it backed up.
Some improvements are happening in this regard, but I can see how people hesitate to rely entirely on that. I spent 7-8 hours recreating my database last night on the newest canary release, which is much improved over the previous time I had to do so, but it’s still a long time.
I’ve only had to do a repair once. But then the local and remote had gotten out of sync and I coul dnot get the repair to work. It went like this during the migration of an old Mac to a new one:
Backup old Mac to a local hard disk with Time Machine
Backup old Mac to a local hard disk with Duplicati
New Mac, migrate user data etc. from the Time Machine backup
Now, the local database is out of sync (local backup information older than what is on the backup storage). I recall I had a lot of problems getting this to work again. If step 2 would have put the local information in the backup and been able to use it (revert back to state on storage) it would have been easy. I should have switched step 1 and 2 (actually what I did was a bit more complicated but the effect was equivalent to the three steps above).
So, while the local info should have been re-buildable from remote, it wasn’t possible because the local info was out of date.
I’m not sure I follow. You can rebuild the local database regardless of the state of the local files (the ones you back up with Duplicati). Rebuilding the local database reads the remote data only.
I have done this as a test more than once. I set up a new VM that didn’t have ANY of the data files protected by Duplicati (on a diff machine). I then told the VM to rebuild the local database and it succeeded, and then I was able to restore data files.
That being said, if you are doing a computer migration it should not be necessary to do a database rebuild. You can instead copy the database from your old machine. Some care needs to be taken when doing this but it isn’t too difficult.
This is not about the files you back up. This is about Duplicati’s configuration. In my case the contents of~root/.config/Duplicati/ on my Mac.
In the scenario that I sketched, I was unable to repair or rebuild from the location where my list and block files are stored. Duplicati kept complaining. This was when I encountered it. Missing remote file was the complaint when trying repair or recreate.
Ah ok. I don’t have experience with doing it on a Mac, but I have done a PC transition where I shut down Duplicati on the original PC, copied over the databases in that folder, and then installed Duplicati on the new PC. Once it started everything worked perfectly.
Although it’s not explicitly said, I guess step 3 is how the out-of-sync Duplicati database got installed.
Reversing the first two steps may have avoided that, and would have been easier than manual copy.
Moving back to the question of the topic’s title though, one drawback of DB backups is that databases sometimes get very large, with distributed changes that hinder deduplication, so backup may be slow. Choosing sizes of key settings for big backups can help out, but the settings would also help recreate, meaning that making DB sizes small enough to backup nicely may reduce benefit of doing backup.
Or so I think from forum posts. Someone who’s been doing backups might chime in with their findings.
Yes. If I had reversed 1 and 2, 3 would have installed the latest version.
The point of all of this is, that while it is stated that the backup itself has everything it needs to be repaired, this is not always the case in practice. In this situation, there was no way to use repair of recreate that worked.
Of course, if Duplicati would be able to let the remote storage override the local information in the configuration database, that would work too. I.e. a command to ‘reset local backup information from remote storage’
The problem, though, is that even the program’s own directions suggest repair first, which apparently isn’t a safe path when local database is intact-but-old, i.e. a restore of old version. Whether or not it does better for non-intact (i.e. broken) databases isn’t known to me, but the main issue IMO is damage to the backend which then apparently can break Recreate (per some reports) or at least delete new backups (per others).
C:\Program Files\Duplicati 2>Duplicati.CommandLine.exe help repair
Usage: repair <storage-URL> [<options>]
Tries to repair the backup. If no local db is found or the db is empty, the
db is re-created with data from the storage. If the db is in place but the
remote storage is corrupt, the remote storage gets repaired with local data
The above doesn’t say a non-recreate repair ever flows info from remote into local database, however I suspect it can. It does say info flows from local data to remote storage (but apparently not always well).
Repair and recreate have been the target of a rewrite, but it’s heavy lifting and I’m not sure where it is… Maybe getting some news on that would help know whether to work on workarounds and stopgap fixes.
No because VSS is not application aware, at least for SQLite - when databases are concerned you need to snapshot everything at the same time, so unless VSS is aware of any journal files or other temporary files for the database they won’t get “snapped” at the same time and you’ll end up with an inconsistent backup.
This reminded me to try out something that could help with this, let me know what you think:
Create a new backup job to backup the Duplicati profile folder i.e. the one with all the SQLite files which for my service based setup is C:\Windows\System32\config\systemprofile\AppData\Local\Duplicati, but do not schedule it so it’s manual. Optionally include any other files you think might be useful for the configuration e.g. I’ve also included the SSL certificate being used by the webservice
Test the new backup job works
Export the backup job as a command-line, including any passwords and copy/paste it into an editor
Create a .CMD batch file and add the exported backup job command-line - you’ll need to prefix the command with start “Duplicati Backup” /D “C:\Program Files\Duplicati 2” so that the spawned process is separate from the original backup job to allow it to end, and you must also replace any % in the job command-line with %% or it won’t work
Edit your other backup jobs and for the Options section add the advanced parameter “run-script-after” pointing at the batch file you just created
Run the backup
Basically the idea is to backup the databases at the end of each backup job, and as we don’t care as much about this special backup’s own database, it doesn’t matter that it doesn’t get backed up cleanly. If it does get lost it should be quicker to recover this database with a repair than all the others you really care about.
I’m going to try this for a few days on one of my machines and see how it goes.