Any communication error - rebuild database

Any time the backup can not progress due to either an internet issue, or, any sort of communication issue, the database gets corrupted and it has to be rebuilt. This has hapenned at least a dozen times over the past year, country internet, lol. Today was an easy case as far as reproducing. I had moved the backup users home directory, but forgot to change the backup. So, the backup was looking for a ssh key file that did not exist as it had the wrong directory. Sure enough, database corrupted. I have since fixed the directory and am rebuilding the database yet again.

ubuntu 20.04, duplicati 2.0.5.114_canary_2021-03-10

Is this any sort of known issue?

Please give some details, such as an error message. If there are several steps (e.g. initial failure, and the next backup fails, or whatever), please describe. Sorry you’ve had so many, but please clarify the pattern.

In my opinion this is still a potential weak spot that needs testing. So far, there is a scenario where errors that exceed number-of-retries (which you might consider raising) can make Duplicati error about missing files, however the repair button on the message fixes that problem. If you see a different error, it would be worth exploring with some extra logs or description to see if the failure can be reproduced for developers.

Ideally, a step-by-step sequence is found, then it can be examined closely, and fixed after it is understood.
Requiring specific destinations makes things a little less ideal. If I understand yours correctly, you use the Duplicati ssh-keyfile option for server access, but a home directory move broke that plan, so no connect?

I’m trying to reproduce this with a working SFTP backup that I break with a username change. It fails with:

May 14, 2021 5:19 PM: Failed while executing "Backup" with id: 12
Renci.SshNet.Common.SshAuthenticationException: Permission denied (password).
   at Duplicati.Library.Main.BackendManager.List()
   at Duplicati.Library.Main.Operation.FilelistProcessor.RemoteListAnalysis(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, IEnumerable`1 protectedFiles)
   at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, IEnumerable`1 protectedFiles)
   at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify(BackendManager backend, String protectedfile)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at CoCoL.ChannelExtensions.WaitForTaskOrThrow(Task task)
   at Duplicati.Library.Main.Controller.<>c__DisplayClass14_0.<Backup>b__0(BackupResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)
   at Duplicati.Server.Runner.Run(IRunnerData data, Boolean fromQueue)

in About → Show log → Stored (fatal backup errors tend to land there instead of in the backup job logs…).

Watching it fail in About → Show log → Live → Retry looks like this. I haven’t clicked on error, but I could…

May 14, 2021 5:19 PM: The operation Backup has failed with error: Permission denied (password).
May 14, 2021 5:19 PM: Fatal error
May 14, 2021 5:19 PM: Backend event: List - Failed: ()
May 14, 2021 5:19 PM: Operation List with file attempt 5 of 5 failed with message: Permission denied (password).
May 14, 2021 5:19 PM: Backend event: List - Started: ()
May 14, 2021 5:19 PM: Backend event: List - Retrying: ()
May 14, 2021 5:19 PM: Operation List with file attempt 4 of 5 failed with message: Permission denied (password).
May 14, 2021 5:19 PM: Backend event: List - Started: ()
May 14, 2021 5:19 PM: Backend event: List - Retrying: ()
May 14, 2021 5:19 PM: Operation List with file attempt 3 of 5 failed with message: Permission denied (password).
May 14, 2021 5:19 PM: Backend event: List - Started: ()
May 14, 2021 5:19 PM: Backend event: List - Retrying: ()
May 14, 2021 5:19 PM: Operation List with file attempt 2 of 5 failed with message: Permission denied (password).
May 14, 2021 5:19 PM: Backend event: List - Started: ()
May 14, 2021 5:19 PM: Backend event: List - Retrying: ()
May 14, 2021 5:19 PM: Operation List with file attempt 1 of 5 failed with message: Permission denied (password).
May 14, 2021 5:19 PM: Backend event: List - Started: ()
May 14, 2021 5:19 PM: The operation Backup has started

All this is normal handling of communication errors. To try your next step, I fix the username and run again.
Works fine. Testing 2.0.5.114 on Windows 10, with a small backup (one short file). What’s the trick to fail?

So, here’s the logged error(s) from the failed ssh connection that caused me to have to rebuild the database. As far as how that specifically works, backups runs automagically each day. I get an email when backup fails. So, I see email, I go to duplicati in web interface and it says there was an error (don’t have that exact message). So, if I run the backup again, or do nothing and wait a day, the next backup attempt says the database is corrupted.

In this case, changing number of retries would not help as it would never be able to login.

Let me know if the error log tells you more, or, if you need additional info. I am sure I can easily make it happen all I want, I could change change the key or whatever to capture any information you may need. So, if you want me to do that, provide details as to exactly what you want me to capture and how. Hopefully, it’s not a difference between Windows and Linux.

Duplicati.Library.Interface.UserInformationException: Failed to parse the keyfile, check the key format and passphrase. Error message was Could not find a part of the path "/home/nextcloud/.ssh/id_backup". ---> System.IO.DirectoryNotFoundException: Could not find a part of the path "/home/nextcloud/.ssh/id_backup".
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x00164] in <533173d24dae460899d2b10975534bb0>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share) [0x00000] in <533173d24dae460899d2b10975534bb0>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.Open (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share) [0x00000] in <533173d24dae460899d2b10975534bb0>:0 
  at Renci.SshNet.PrivateKeyFile..ctor (System.String fileName, System.String passPhrase) [0x00019] in <8c66a76a693c4e52b456ec0712c863dd>:0 
  at Renci.SshNet.PrivateKeyFile..ctor (System.String fileName) [0x00000] in <8c66a76a693c4e52b456ec0712c863dd>:0 
  at Duplicati.Library.Backend.SSHv2.ValidateKeyFile (System.String filename, System.String password) [0x0001f] in <3dd3bd71357d4450ba0d2b0f8a2dbe3e>:0 
   --- End of inner exception stack trace ---
  at Duplicati.Library.Main.BackendManager.List () [0x00049] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.RecreateDatabaseHandler.DoRun (Duplicati.Library.Main.Database.LocalDatabase dbparent, System.Boolean updating, Duplicati.Library.Utility.IFilter filter, Duplicati.Library.Main.Operation.RecreateDatabaseHandler+NumberedFilterFilelistDelegate filelistfilter, Duplicati.Library.Main.Operation.RecreateDatabaseHandler+BlockVolumePostProcessor blockprocessor) [0x00084] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.RecreateDatabaseHandler.Run (System.String path, Duplicati.Library.Utility.IFilter filter, Duplicati.Library.Main.Operation.RecreateDatabaseHandler+NumberedFilterFilelistDelegate filelistfilter, Duplicati.Library.Main.Operation.RecreateDatabaseHandler+BlockVolumePostProcessor blockprocessor) [0x00037] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.RepairHandler.RunRepairLocal (Duplicati.Library.Utility.IFilter filter) [0x000ba] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.RepairHandler.Run (Duplicati.Library.Utility.IFilter filter) [0x00158] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify (Duplicati.Library.Main.BackendManager backend, System.String protectedfile) [0x000d5] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x01048] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0026f] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.Backup (System.String[] inputsources, Duplicati.Library.Utility.IFilter filter) [0x00074] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Server.Runner.Run (Duplicati.Server.Runner+IRunnerData data, System.Boolean fromQueue) [0x00349] in <d38f460035ab4f3b9bab2fc91e2cec32>:0

That one doesn’t, but is there a log from unexpected error below, e.g. is there a log that announces:

To reproduce it, a good test would be a small backup like mine that can be made to fail, which would eventually turn into a GitHub Issue with steps, logs, etc. The forum can’t track issues, if you find one.

This might get into heavier duty logging with log-file=<path> and some more intense log-file-log-level, however it’s probably best to start slowly. Right now I don’t even know what the corruption error says.

Ok, I can pursue further once I get a chance. Meanwhile, here’s the log from the next run after the above logfile, if it tells you anything either.

Duplicati.Library.Interface.UserInformationException: The database was attempted repaired, but the repair did not complete. This database may be incomplete and the backup process cannot continue. You may delete the local database and attempt to repair it again.
  at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00122] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0026f] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.Backup (System.String[] inputsources, Duplicati.Library.Utility.IFilter filter) [0x00074] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Server.Runner.Run (Duplicati.Server.Runner+IRunnerData data, System.Boolean fromQueue) [0x00349] in <d38f460035ab4f3b9bab2fc91e2cec32>:0

means what it says, but what’s not clear is when the repair ran, or whether communication error was involved. As a side note, I had my Wi-Fi fail this morning, and a backup failure, but no issue next time.

Do you have auto-cleanup set in Advanced options? That can run the Repair before backup if needed.

and if you don’t use that, then you might need to check history to see when Repair ran, and how it failed.
Once a Repair is attempt and does not complete successfully, chances are it won’t the next time it tries.

Do you have a complete working backup when your communications are working? If so, maybe set up a log-file at log-file-log-level=retry to collect info for whenever next failure happens, or set up a new backup, get it working, then see if you can break it with at least that amount of logging going, to see how it breaks.

Yeah, so, I did not run any repair until AFTER it tried to repair it on the run after the connection error. There are only 2 log files, and I sent them both. There is no other log file related to the event. So, runs every day, backup fails, next run it tries to auto repair and fails every time, but there is no communication error since it’s been resolved before then. Then, I run delete and repair database and it works every time.

As far as options, at settings level outside of backup, auto cleanup, auto vacuum, retention policy, and throttle upload. Within the backup, ssh fingerprint and keyfile, run script before, and run script after. The scripts take care of putting nextcloud into main mode and does sql backup, then when backup done takes out of maint mode.

Wondering if the “auto cleanup” is actually the cause. I believe I can easily reproduce this, so, could reproduce with auto cleanup and current settings, just to make sure it still causes the issue, and, then turn it off and try again (after remaking database of course).

It’s not likely (though not impossible) to be the bottom of the cause, but it might be a step. If it manages to run, not succeed, not log in a way that you see (please increase your log level), then backup will error out before even starting, because it will notice the database isn’t repaired, and won’t backup to bad database.

It’s possible that there’s some hidden flaw in the backup files on the destination, which is why I suggest a small fresh backup, proven to run, then see if you can break it with some logs running to see what it does.

Ok, I can do that I suppose. But the same thing happened when I was backing up to B2. I converted over around 2 months ago, and started the backup sets over. I have 4 backup sets on 2 machines, all of them do this.

Please do. I tested a new SFTP backup using auto-cleanup, trying to get an error like thisby stopping the SFTP server. Stopping it then starting a backup recovered fine on next backup. So did stopping it in middle of file upload. I did not test stopping it during a compact but I’m not sure yours compacts all the time either. You can check the per-backup logs to see if you’re compacting a lot. This will also be in the log file you get.

Then you should easily be able to get logs. Starting with a fresh backup is more useful though, then file an issue with steps, and that will maybe allow a developer to reproduce it. Without a way to see it, little to say.

I made a small backup, created it, changed the keyfile name to one that did not exist, ran it again, failed, got a message “Attempt to write a read-only database attempt to write a readonly database” whatever that might mean since it runs as root locally. I changed the keyfile name back to correct name. Ran it again, got “The database was attempted repaired, but the repair did not complete. This database may be incomplete and the backup process cannot continue. You may delete the local database and attempt to repair it again.”.

Run with bad keyfile after first good run:

Mono.Data.Sqlite.SqliteException (0x80004005): Attempt to write a read-only database

attempt to write a readonly database
  at Mono.Data.Sqlite.SQLite3.Reset (Mono.Data.Sqlite.SqliteStatement stmt) [0x00084] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at Mono.Data.Sqlite.SQLite3.Step (Mono.Data.Sqlite.SqliteStatement stmt) [0x0003d] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at Mono.Data.Sqlite.SqliteDataReader.NextResult () [0x00104] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at Mono.Data.Sqlite.SqliteDataReader..ctor (Mono.Data.Sqlite.SqliteCommand cmd, System.Data.CommandBehavior behave) [0x0004e] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at (wrapper remoting-invoke-with-check) Mono.Data.Sqlite.SqliteDataReader..ctor(Mono.Data.Sqlite.SqliteCommand,System.Data.CommandBehavior)
  at Mono.Data.Sqlite.SqliteCommand.ExecuteReader (System.Data.CommandBehavior behavior) [0x00006] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at Mono.Data.Sqlite.SqliteCommand.ExecuteNonQuery () [0x00000] in <83c72f6e53eb49f28420feee73a4aa07>:0 
  at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery (System.Data.IDbCommand self, System.Boolean writeLog, System.String cmd, System.Object[] values) [0x0005e] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery (System.Data.IDbCommand self, System.String cmd, System.Object[] values) [0x00000] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Database.LocalDatabase.Dispose () [0x00052] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Database.LocalBackupDatabase.Dispose () [0x00000] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x010c6] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0026f] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.Backup (System.String[] inputsources, Duplicati.Library.Utility.IFilter filter) [0x00074] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Server.Runner.Run (Duplicati.Server.Runner+IRunnerData data, System.Boolean fromQueue) [0x00349] in <d38f460035ab4f3b9bab2fc91e2cec32>:0 

Fixed keyfile, ran again (supposed to work):

Duplicati.Library.Interface.UserInformationException: The database was attempted repaired, but the repair did not complete. This database may be incomplete and the backup process cannot continue. You may delete the local database and attempt to repair it again.
  at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00122] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0026f] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Library.Main.Controller.Backup (System.String[] inputsources, Duplicati.Library.Utility.IFilter filter) [0x00074] in <e4ba2224a87c42cbadcd524985619ee3>:0 
  at Duplicati.Server.Runner.Run (Duplicati.Server.Runner+IRunnerData data, System.Boolean fromQueue) [0x00349] in <d38f460035ab4f3b9bab2fc91e2cec32>:0 

So, what else should I capture for a new case? Set anything else? Send export of configuration of the test backup?

Corrupted database after communication / connection errors #4516 is the issue, I guess. Thanks for filing.
I might post a reverse link back to this post anyway, in case it helps, but the work on issue is looking good.