Really frustrated with Duplicati's lack of feedback

ts678 · April 17, 2024, 9:40pm

Welcome to the forum @cyberfox

Yes, it doesn’t scale well. Possibly the right developers can improve this, but they’re typically scarce.

Yes, sometimes a profiling log view in live log is necessary to figure out it has gotten into a slow query. Regarding SQLite, is the problem that the embedded DB resource use is hard to observe separately? Duplicati tries for a simple install without a separate database to install, maintain, maybe even backup.

Ah, you answered this later, going to the effort of running gdb to see where it was at, and commenting:

Recent comment from the primary developer:

and there’s plenty of other potential future stuff, however progress continues to be limited by resources.
Previously I would have said volunteers, but there’s now some company funding, so possibility of hiring.

Delivered as promised. I’ll try to reply, but some of this is known, yet limited by the available resources.

Yes, that’s one of the workarounds, and it’s effective against the block loading generated by lots of files.

A small number of large files can be kept under control with a large blocksize to keep block count lower. Unfortunately, blocksize cannot be changed on an existing backup, and there little guidance in advance.

The Options screen top item is “Remote volume size”, and it does try to steer anybody watching well to Remote Volume Size in the manual. Above that is a discussion of the block size, which could say more.

I’ve dropped hints that maybe block size deserves a similar visible question somewhere in job creation.

So there’s potential user manual work (for those who read it), potential GUI work (for those who read it), potential speedups (typically someone reports a slow spot, and maybe someone figures out what to do, workarounds after-the-fact as support (kind of late), and potential testing to steer all of the above things.

There is typically a desire to make things reliable, and making them work fast is sometimes less urgent.

is an exception, but it’s probably being driven by .NET Framework and mono (and libraries for them, etc.) starting to fade away. Duplicati migration to newer .NET adds speed (but maybe not in SQL) as a bonus.

I’m surprised it got to 0. Seeing it go up and down is normal. It’s average speed, and upload is occasional. During non-initial backups, you might see average slowly sink, then rise up again as remote volume goes.

Run PRAGMA optimize upon closing database connection #3745
is (I think) where things stand now, and there was a discussion of whether to analyze or rely on optimize.
PRAGMA optimize is (at least in current docs), also suggested every few hours. I don’t think we do though. Running it at close time also doesn’t run (I assume) if a process gets killed rather than closing normally, so maybe optimize should at least be run at open even if it’s normally redundant with one run by clean close.

This is pretty close to the rough rule of thumb of blocksize being tuned for 100 GB (1 million blocks), but it’s not something that’s been well explored, given the limits of equipment and available personnel to go explore. There are also special situations that are the exception to the general rule, so let’s see what’s coming next:

disable-synthetic-filelist

--disable-synthetic-filelist = false
If Duplicati detects that the previous backup did not complete, it will generate a filelist that is a merge of the last completed backup and the contents that were uploaded in the incomplete backup session.

If it’s an actual SQLite bug. their team seems quite responsive, but need an accurate description of bug, which is a good step even if it’s just Duplicati and a routine query that turns disastrous in query planning.

First step is a reproducible test case. Next is maybe a look at About → Show log → Live → Profiling, but behavior of that is a little unreliable for me. It skips output, e.g. during DB recreate this gets very obvious.

log-file=<path> log-file-log-level=profiling is a better start, and you can make an even huger file by adding

  --profile-all-database-queries (Boolean): Activates logging of all database
    queries
    To improve performance of the backups, frequent database queries are not
    logged by default. Enable this option to log all database queries, and
    remember to set either --console-log-level=Profiling or
    --log-file-log-level=Profiling to report the additional log data
    * default value: false

Once the query is known, it can potentially be isolated to run in something like sqlitebrowser for possible replication of the slow query, then one can run EXPLAIN QUERY PLAN to try to figure out why it got slow.

Sometimes one rewrites it, sometimes one adds indexes. I’m not an SQL guru, but we could use some.

If there’s something going wrong that’s not SQL, the profiling log (or a verbose log, or even less) can help because the GUI (thankfully?) isn’t throwing all that stuff at you all the time. But sometimes you need that.

EDIT 1:

A developer would have to weigh in on this, but a possible alternative to the constant flood of internal stuff which profiling log tends to create (especially with `profile-all-database-queries) would be logs-as-needed, meaning operations that take too long get noticed and some note made somewhere not in a job database, because those get rolled back on kills due to the lack of a clean commit if the job doesn’t finish completely.

I’m not sure of your technical background (looks better than average), but there are lots of chances to help.

EDIT 2:

Maybe something can also show in the GUI, but I’m not sure how to give meaningful info on internal status. Status bar could probably use a few more phases, but it likely can’t show every potentially slow SQL query. Putting timeouts in various spots likely to be slow or hang could also be handy for things like backend work.

EDIT 3:

github.com

duplicati/duplicati/blob/23065d44ef142c104d47ef84e083b51cdf5d54ff/Duplicati/Library/Logging/Timer.cs#L61-L68


      
          public void Dispose()

          {

              if (m_operation == null)

                  return;

          

              Log.WriteProfilingMessage(LOGTAG + ".Finished-" + m_logtag, m_logid, "{0} took {1:d\\:hh\\:mm\\:ss\\.fff}", m_operation, DateTime.Now - m_begin);

              m_operation = null;

          }

seems to be where the timing information comes from, but it just waits for Dispose instead of monitoring. While I’m not a C# developer, I’m pretty sure it would be possible to notice a slow query while it’s running.

github.com

duplicati/duplicati/blob/23065d44ef142c104d47ef84e083b51cdf5d54ff/Duplicati/Library/Main/Database/ExtensionMethods.cs#L194-L195


      
          using(writeLog ? new Logging.Timer(LOGTAG, "ExecuteReader", string.Format("ExecuteReader: {0}", self.GetPrintableCommandText())) : null)

              return self.ExecuteReader();

is an example of how a query gets into SQLite, and there are a finite number of them, all using this pattern. Results of the queries vary, and one might have to be careful of slow consumer looking like a slow query…