Why the heck CAN'T we change the blocksize?

ts678 · September 29, 2021, 1:03pm

That sounds like a wonderful intent, but the blocksize is already set if you start from destination files.
Maybe I misunderstand the details of steps. I’d have thought there’d be a backup fresh start involved.
Although first focus is on recreate, that would be a good chance to look at the other times and sizes.

I guess ultimately the goal is to find slow spots that are potentially addressable. Broad metrics would include CPU load (maybe both aggregate and per-logical-processor), drive load (ideally including the “overload” factor that queue length implies), and someday network activity, although DB recreate that becomes painfully slow typically is processing small files such as dindex, so download isn’t a factor.

In terms of processing steps, a log file at verbose level shows this. Live log is fine if a peek is needed, however log-file has better time resolution (seconds) and is easier to maybe match with other metrics.

Profiling level log is good for finding slow SQL. One can (I think) also see some patterns, e.g. where a dindex file is downloaded, then taken apart for block processing. I don’t recall the exact rhythm I once observed, but an example would be that a default 50 MB dblock at 100 KB blocksize might have 500 blocks (or more) to process. The recreate reads the dindex file that corresponds to a specific dblock.

My rough experiment was just looking for the “long pole” in SQL time, maybe using a regular expression looking in took times for a minutes digit that stopped being 0. I’m pretty sure I didn’t script to find smaller delays that repeated a lot. The results table in my raw note showed one query soon dominated the time.

I still hope for a volunteer with SQL expertise who can help sanity-check and troubleshoot the slow spot.

Even though SQLite can occupy a logical processor, I suspect it’s also busy working with database files. Possibly it’s also using internal temporary files (named with etilqs). One can casually look at disk load with a friendly tool such as Task Manager, but Resource Monitor is good at naming names, Performance Monitor is better at logging, Sysinternals Process Monitor is good for seeing the details. Because that’s a lot, @amloessb suggested here that ProcessActivityView has a readable summary (and posted some).

Because several people have been looking at performance, I’d encourage community input, not just mine.

I’m also not as familiar with Linux performance tools, although there seem to be a lot of them (varying by distro?). Of ones that focus on logs, I’ve used sar in the past, and strace can show the low-level details, including what paths are being accessed, e.g. if one wants to see if etilqs files are part of the slowness.

Windows 10 is common. I have it here, and probably a lot of people do. Does Server provide better ways for studying performance? If so, have at them, but also try to find something that those on 10 can look at.
If there’s no good reason to use Server, probably just run with Windows 10 if you decide to run Windows.

If you go with Debian, I suspect the tools it has are available in other distros (but maybe not by default), so that might be a good fit for followup by the developer who might follow up (provided you can scale it down).

There are so few developer volunteers that reproducible issues and some problem isolation can help a lot. Steps so far are external by readily available tools, but (if you like) code-level instrumentation can be done.

Thanks for volunteering some equipment and time. Every step taken potentially adds to the understanding.