Release: 2.0.9.108 (Canary) 2024-10-03

In Docker you can normally specify the time zone, at least I can with all the containers I run although I use Podman but I’m sure Docker has this also.

For the stuck job, this is what Duplicati files are in use:
image
image

No disk activity
image

CPU - machine has 4 cores

From Process Explorer


What does View → System Information → I/O show for I/O and Disk (which likely differ)?

image
image

Thanks. That seems (to me) pretty conclusive that it’s not just Windows File Caching.

The only other tool I can think of short of a debugger is Sysinternals Process Monitor.
I’m not counting on it, since it “looks” like there’s no I/O, and it won’t show CPU detail.

I’m hoping you already ran About → Show log → Live → Profiling to look for its clues.
Problem with that is that the focus is on the SQL, and newly found slowdown was not:

Blog post: Speeding up inner workings of DoCompact() by up to 1000x

Maybe someone more familiar with compact could tell if your slowdown is in that spot.

Was grabbing the output from an earlier Process Monitor when I saw that it finally completed:

As you can see it ran it again once it caught up the other missed backups, so all is still well.

Here’s the output from Process Monitor in case it helps: Logfile.CSV - Pastebin.com

That problem would occur if you have a lot of remote files (maybe thousands) but the slowdown (prior to the fix) would cause the slowdown to grow exponentially with an increase in the remote files. It only happens during the compacting phase, so if you have a lot of files at the destination, this is likely it.

The symptoms would be:

  • No IO action
  • Single core/thread running 100%
  • No log or profiling output during the stuck phase

Thanks! I will try to scan it and see if something comes up.

Could this explain the deleting unwanted slowdown for this backup as I did week earlier remove hundreds of files, plus the backup jobs remote folder is currently now at 76,000+ files.

Yes! With 76k+ files, there would be a noticeable slowdown.
New build is running which includes the fix.

1 Like

At some point the LZMA compression was broken. But I haven’t ever seen anyone confirming that it’s NOT broken anymore.

Bug report @kenkendk

– Issue with latest version, maybe related to the changes to the recovery logic with this version? –

Probably some strange code path which isn’t included in tests and includes just a simple error in SQL code? (My guess)

Based on this, I assume it’s trivial to fix. If necessary someone can drop this to GitHub as separate issue and link here. - Thanks

2.0.9.108_canary_2024-10-03

duplicati-iffc7a6eb9ca24ecbbec6d46d662f5afa.dindex.zip.aes (18,26 KT)
2024-11-06 09.02.57 +02 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 2373 of 2373
2024-11-06 09.03.11 +02 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 8 missing volumes; attempting to replace blocks from existing volumes
2024-11-06 09.06.45 +02 - [Error-Duplicati.Library.Main.Controller-FailedOperation]: SQL logic error no such column: Temporary
code = Error (1), message = System.Data.SQLite.SQLiteException (0x800007BF): SQL logic error
no such column: Temporary
   at System.Data.SQLite.SQLite3.Prepare(SQLiteConnection cnn, String strSql, SQLiteStatement previous, UInt32 timeoutMS, String& strRemain)
   at System.Data.SQLite.SQLiteCommand.BuildNextCommand()
   at System.Data.SQLite.SQLiteDataReader.NextResult()
   at System.Data.SQLite.SQLiteDataReader..ctor(SQLiteCommand cmd, CommandBehavior behave)
   at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery(CommandBehavior behavior)
   at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery(IDbCommand self, Boolean writeLog, String cmd, Object[] values)
   at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery(IDbCommand self, String cmd, Object[] values)
   at Duplicati.Library.Main.Database.LocalRecreateDatabase.CleanupMissingVolumes()
   at Duplicati.Library.Main.Operation.RecreateDatabaseHandler.DoRun(LocalDatabase dbparent, Boolean updating, IFilter filter, NumberedFilterFilelistDelegate filelistfilter, BlockVolumePostProcessor blockprocessor)
   at Duplicati.Library.Main.Operation.RestoreHandler.Run(String[] paths, IFilter filter)
   at Duplicati.Library.Main.Controller.<>c__DisplayClass13_0.<Restore>b__0(RestoreResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
2024-11-06 13.16.55 +02 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: Toimenpide Restore aloitettiin
2024-11-06 13.16.55 +02 - [Information-Duplicati.Library.Main.Operation.RestoreHandler-NoLocalDatabase]: No local database, building a temporary database
2024-11-06 13.16.58 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2024-11-06 13.16.58 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (4,63 KT)

Found 8 missing volumes; attempting to replace blocks from existing volumes
SQL logic error
no such column: Temporary => SQL logic error
no such column: Temporary

code = Error (1), message = System.Data.SQLite.SQLiteException (0x800007BF): SQL logic error
no such column: Temporary
   at System.Data.SQLite.SQLite3.Prepare(SQLiteConnection cnn, String strSql, SQLiteStatement previous, UInt32 timeoutMS, String& strRemain)
   at System.Data.SQLite.SQLiteCommand.BuildNextCommand()
   at System.Data.SQLite.SQLiteDataReader.NextResult()
   at System.Data.SQLite.SQLiteDataReader..ctor(SQLiteCommand cmd, CommandBehavior behave)
   at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery(CommandBehavior behavior)
   at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery(IDbCommand self, Boolean writeLog, String cmd, Object[] values)
   at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteNonQuery(IDbCommand self, String cmd, Object[] values)
   at Duplicati.Library.Main.Database.LocalRecreateDatabase.CleanupMissingVolumes()
   at Duplicati.Library.Main.Operation.RecreateDatabaseHandler.DoRun(LocalDatabase dbparent, Boolean updating, IFilter filter, NumberedFilterFilelistDelegate filelistfilter, BlockVolumePostProcessor blockprocessor)
   at Duplicati.Library.Main.Operation.RestoreHandler.Run(String[] paths, IFilter filter)
   at Duplicati.Library.Main.Controller.<>c__DisplayClass13_0.<Restore>b__0(RestoreResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Restore(String[] paths, IFilter filter)
   at Duplicati.CommandLine.Commands.Restore(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)

Yes, the problem was that SQLite was updated to no longer support " as a string quote.
This is a great move for future stability and was applied throughout Duplicati, except in this one place where it was missed.

The issue has been fixed and will be in the next canary build (in the oven now).

2 Likes

Got any early evidence, personal evidence, or citations? I haven’t done a big research recently.

I don’t think it’s used enough to say definitively, but the 2.0.9.107 report which continues here is awaiting further isolation by @Taomyn because error is likely not fixable without an actual case which could be given to the SharpCompress project as an issue or pull request for an LZMA fix:

We got it narrowed from three dblock files to one, but even one dblock can contain sensitive data. Same is true for one block, but with reduced chance. Maybe copy the database for a future check about which files use the bad block. As more time passes, the check will become less trustworthy because source file versions will be deleted by retention and connection to the blocks will worsen.

I can see that the only line that appears to throw that exact error is here:

I am not into LZMA details enough to figure out why, but it seems to be used infrequently enough that this is not a reported error on SharpCompress.

There are at least two others, according to current GitHub web search tool of the repository.

This part I did research, and was not able to find an open issue. If we can get the bad block isolated, then perhaps we can open an issue. I doubt the issue would progress without that.

I also tried looking into release notes and file change history (but could have missed some). Tentative suspicion was this is probably not a bug in old SharpCompress that got fixed. The troublesome .zip file should have dates and a manifest inside it that can trace its creation.

This information has been requested too. It’s probably sitting on Desktop of affected system.

Yes, but not from the LZMA Decompressor?

Without trying to figure out which are relevant to Duplicati’s usage, the ones not in LzmaDecoder.cs are in LzmaStream.cs and LzOutWindow.cs. I’m just noting the use, not trying to follow all the code.

While getting setup for 2.0.9.109, I found 2.0.9.108 TrayIcon icon was not visible anymore to Quit. Possibly I already did a Quit some days ago to free up port 8200 for testing with Windows service.

Current network status of TrayIcon and Server (respectively) are below, and problem was masked because I would have gone to localhost:8200 for the service, and that would have come up OK:

C:\>netstat -ano | findstr 33620
  TCP    0.0.0.0:8200           0.0.0.0:0              LISTENING       33620
  TCP    127.0.0.1:60694        127.0.0.1:8200         ESTABLISHED     33620

C:\>netstat -ano | findstr 20568
  TCP    127.0.0.1:8200         0.0.0.0:0              LISTENING       20568
  TCP    127.0.0.1:8200         127.0.0.1:57939        ESTABLISHED     20568
  TCP    127.0.0.1:8200         127.0.0.1:60694        ESTABLISHED     20568

EDIT:

Stopping the Windows service let localhost get to the icon-less TrayIcon. It appears healthy.
I’d previously looked at its external log-file and suspected this. I’ll wait a bit before killing.

Is this an instance where the WindowsService is running, and the TrayIcon is then connected to the server? So the TrayIcon can crash on its own?

No. It’s my usual standalone TrayIcon which I had intended to Quit (and likely tried) to release port 8200 for the Windows service. These days I try to run 2.0.9.x on 8200 because I don’t like it taking higher ports and breaking my tab layout. This was less of a problem before assignments got sticky.

Ok, in that case I think this is related to the similar disappearing icons on Linux.
My guess is that Avalonia somehow crashes, but the process hangs because the server is still running.
I will create an issue tracking this.

Sorry it’s a bit mysterious, but I’m pretty sure I would have asked for a Quit (but didn’t verify it).

Although its timing doesn’t fit, FWIW I did get a message (with a date for a change) in console:

warn: Microsoft.AspNetCore.Server.Kestrel[22]
      As of "11/06/2024 07:59:29 +00:00", the heartbeat has been running for "03:55:19.8264513" which is longer than "00:00:01". This could be caused by thread pool starvation.

EDIT:

Time is probably the Windows 3 A.M wakeup to check for updates or whatever happens then…