Database recreate performance

And improve recreate too :smiley:

I tried recreate recently and the time was so poor that I deleted the backup instead and redid it in far less time. Definitely need a exceedingly large improvement there (or the feature dropped in my opinion as its that poor. If it takes longer than a redo then its almost worthless).

I’ve been complaining about (and proving) this problem for years. What I’ve noticed recently is that the recreate process gets progressively slower, and it slows down everything else on my macbook as well. I recently upgraded to an M2 macbook with 96GB of RAM, and my biggest rebuild is still taking weeks.

Meanwhile, performance of everything else, including the Chrome browser, is getting worse by the day, even though the macOS Activity Monitor shows little CPU or memory pressure.

For example, when I started the current rebuild a week ago, the GETs from the GDrive remote storage were taking about 15 seconds. Now they’re taking about one minute for the same size blocks. Chrome is taking a long time to load pages, some of which are timing out. Yet Speedtest shows that my GB internet (wired) is still performing at full spreed.

(Rant on)

What kills me is that the database is so darn delicate!

Timemachine tries to backup the database while Duplicati is running a task: database corrupted, delete and recreate. Crash/reboot the machine while Duplicati is running a task: database corrupted, delete and recreate. The cat walks over the laptop and disconnects the external source hard drives while Duplicati is running a task: database corrupted, delete and recreate.

Simple database repair is rarely successful (“database repair was attempted but did not complete, please delete and recreate the database”).

And… there is no checkpoint feature (that I know of) to allow me to pause ANY operation (backup, rebuild, etc.), reboot my machine, and resume the operation. I haven’t been able to run Time Machine for weeks while waiting for the current delete/recreate to complete.

I am a long-time, early adopter, user of Duplicati, and have worked closely with the developers on occasion to test bug fixes. I truly believe in Duplicati, as a way to ensure that if my on-site storage is destroyed or corrupted, I’ll always have an off-site backup. I am deeply indebted to the volunteers who develop and improve Duplicati, and hope you can forgive my frustration.

(Rant off)

Thanks for listening… Steve

@StevenKSanford

re: database prone to corruption: all database can be corrupted, the specificity of sqlite is that it is an embedded database, so if you kill the application, the database is killed with it and no database transaction can unroll the changes cleanly.
The only workaround is to avoid trouble, really - I don’t want to give my thoughts about the cat, this being a family forum after all.

This said, duplicati has code for repairing database without recreating it. I have tried a few times and it has mostly worked. Code can always be made more robust, but recreate is not always needed.

re: recreate slow
Recreate is slow because some queries are not well optimized for big databases. The kill factor is not so much the size of the data to backup, it’s rather the number of blocks (size of data to backup divided by the unit of deduplication, 100 K by default). So people having bigs backups, usually over 1 TB, but the true unit of size with Duplicati is the blocks number, are raising the dedup block size.
If with a 100 GB database you have, say, 3 millions blocks, it will begin to be seriously slow, but if you raise the block size by a 10 factor (1 Mb), you have only 300000 blocks and it’s tolerably fast, with the price being that you may have a bit less deduplication. Disk speed matters obviously since SSD are at least 10 times faster. If you raise the block size, you have to recreate your backup obviously.

I have pushed experimental code for better database recreation toward the Github project:

but so far no one has tried it; obviously using code written by someone not a Duplicati expert is a risk. I have tested it and it seems to work but there are a lot of unknowns unknowns :slight_smile:

Suppose I would test it but I’d rather workaround the problem by not recreating that way. Just create new instead and problem solved. New is a better way to go as its a protection.

If you ask me, Duplicati is too complicated and code should be removed. Remove the recreate old stuff and just stick a call to new. Simplified, project is less complicated, less to fix, less problems. Done.

Do you mean zap all the backups, empty the backend and create a new backup ?

Have you tested Exclude files from a Time Machine backup on Mac (macOS User Guide)?
Note Apple’s note about APFS local snapshots, but I see you’re talking about backup here.

That might be an SSD downgrade, although it might depend on what model you got.
I don’t know all the Apple model details, but you can search for yours. One example:

512GB version of the new MacBook Pro has a slower SSD than the Mac it replaces
“Apple is using fewer chips in M2 Macs to provide the same amount of storage.”

View disk activity in Activity Monitor on Mac sounds tricky to interpret, but any clues?
I’d prefer queue length or a load average including I/O wait. I’m not a macOS expert.

Duplicati effects on other things come from something, so keep on looking if you like.
You can also make Duplicati more polite, yielding to other demands but slowing itself.

use-background-io-priority would be the one for drive contention not yet investigated.
thread-priority may help CPU, however the breadth of the slowdown suggests it’s OK
except maybe for programs which can’t use multiple cores easily, such as Duplicati’s
SQLite database, meaning it could be going flat out on one core at 10% load over 10.

Big backups (e.g. when over 100 GB) such as some of yours need a larger blocksize,
because without that some of the SQL queries get really slow. You can watch them in
About → Show log → Live → Profiling if you like. Also see the @gpatel-fr comments.

That’s likely RAM resident. If so it wouldn’t notice an issue if the SSD was being slow.
Chrome and pretty much any browser will write the download to the drive as a cache.

Assuming that’s Duplicati Get, first make sure this is the Started to Completed time.

2022-10-17 16:52:32 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b570acf7705434570b871528c607064ff.dblock.zip.aes (45.16 MB)
2022-10-17 16:52:39 -04 - [Profiling-Duplicati.Library.Main.BackendManager-DownloadSpeed]: Downloaded and decrypted 45.16 MB in 00:00:06.6162567, 6.83 MB/s
2022-10-17 16:52:39 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b570acf7705434570b871528c607064ff.dblock.zip.aes (45.16 MB)

File does get written to the drive. Above “politeness” options can slow your download.
I’m not sure if any decryption slowness (per middle line) will delay the Completed line.

If you want to try a more pure download test, Export As Command-line and run get in
Duplicati.CommandLine.BackendTool.exe and see if that also takes a minute to finish.

Any breakage needs a detailed description. To just say it “corrupted” leads us nowhere.
Feel free to cite previous topics if you think we’ve beaten against some of those cases.
I’m pretty sure I haven’t been in one involving the cat, but that’s especially odd, since a
source interruption (while maybe risky to the source drive) is very far removed from DB.

Ideally description has steps reproducible on any OS, thus allowing more people to test.
If need be, we could probably ask @JimboJones who I think actually has macOS to run.

I sure hope not. Some people like their file history. We try hard to try to let them keep it…
Beyond that, if a database recreate isn’t there, it impedes disaster recoveries (drive loss).
Other ways such as Duplicati.CommandLine.RecoveryTool.exe are for emergencies only.

In the current case, especially with any large backup, blocksize change needs fresh start.
I heard talk of 12 TB of backups, so keep the current 100 GB rough rule of thumb in mind.

Please change default blocksize to at least 1MB #4629 can boost 100 GB advice to 1 TB.
Despite years of SQL improving, Duplicati still slows down when holding too many blocks.
Solving that is simple (bigger blocksize) if one knows in advance or is willing to fresh-start.

I understand the reasoning on why someone wouldn’t want that.That’s why they’re not the same. But, pros and cons to both. At the same time waiting days, weeks, or months for recreate to hopefully finish isn’t a good thing either.

Big con is that Duplicati is too big a project, too slow to fix everything, too much to fix, etc. It really should be slimmed down. But, that’s just my viewpoint and not that it would happen.

You’re welcome to say nooooo NOOOOO lol. I’d axe right out instantly.

Sure thing, let me know what you need. I don’t have any M1/M2s but they are up-to date versions of macOS.

would maybe not be coming up if, for example, there was only CLI (but people would Control-C more).
Do CLI-only backup programs attempt real-time control I wonder? My guess is no, but might be wrong.
You give people a GUI with buttons that work in some situations, not in others, and you get complaints.

The pause issue is explained elsewhere in the forum. At least currently, it mainly stops scheduled work.

Back to Control-C or whatever equivalent one’s OS uses to rudely kill some CLI process that’s running:

SQLite puts up with this quite well as far as I see, but application also needs to handle commit/rollback.
There are a couple of open issues on that which I found by accident then confirmed with lots of killing…

My current work is to try to see if I can find how to make recreate get into its dreaded dblock downloads. Without those, it’s not so bad. Until then, we rely on working with the user community to get their details.

Maybe you could take a try at seeing if you can get Time Machine interference with Duplicati database, however a really quick Internet search wasn’t showing me impacts on SQLite, which might be possible. Both programs might want exclusive use of the database at the same time, and the OS would not allow.

If so, maybe you can test the exclusion instead of risking actual slow operation @StevenKSanford is in. Consulting on macOS would also be helpful. How and using what tools can one examine performance?

Some of the other issues may be cross-platform, but they pretty much all are too vague to comment on. Suggestion might be to either open or locate relevant topics. This one is already off-topic, per topic title.

“Database recreate performance” is related to hardware, backup size, blocksize, damage to destination causing dblock downloads (from 70% and beyond on progress bar, and can also be seen in the live log), SQL query code, internal code design which might reuse general code rather than recreate-specific, etc.

@StevenKSanford how about detailing exactly and accurately (quoted message doesn’t exist) the steps leading to any database corruption? If you can do it, maybe someone else (on macOS or not) could test. Even after you know the right message, you might be able to search for some relevant prior forum notes.

You might be trying to say “The database was attempted repaired, but the repair did not complete. This database may be incomplete and the repair process is not allowed to alter remote files as that could result in data loss.” which could be looked up in code to see what it means. Or maybe you saw something else.

the real problem I think is that for each dblock, Duplicati runs these 2 queries that I have tried to optimize - and their performance seems to decrease exponentially with the blocks number. That’s what I hope to fix, if performance decreases proportionally instead it’s as good as it can get.

Yes it may seem like that, but like in all things it’s necessary to look at the problems and evaluate them to solve the more pressing one.
In my opinion, the biggest short term Duplicati problem is by far that it does not scale well. I think that my 2 PR are fixing one of the big performance roadblocks (database list), and for the other major performance problem (database recreate) is making a good advance toward solving it.

I don’t think that the Duplicati code is unmaintainable. The file LocalDatabaseRecreate.cs is mostly fine, only nit is that the main routine should be subdivised in 3 sub fonctions for the 3 steps to do to recreate the database, it’s too big and that’s what is making it difficult to understand.

Now, the function I touched in LocalDatabase.cs, that’s another story. I don’t rewrite a function instead of fixing it for nothing. There are some dark parts that were probably done in bad conditions while time starved, they are not easy to read at all and it’s probably better to redo them.

Some people describe it as a cliff. If it can degrade more slowly with blocks, that’d be great, however there’s also quite a bit of download time if it winds up downloading all dblocks, especially if that doesn’t find enough. My SQL skills are not quite up to digging deeply into the three phases. Third is the worst.

Both directions may help, and there are also the general (somewhat off topic here) corruption mysteries (some, and some others are known with issues filed, waiting for work). Maybe we’ll hear what’s up here.

Thanks for digging in.

why would Duplicati download all dblocks ? You have to understand what it is doing. The problem is that with the dlist files Duplicati gets a description of the existing files with the first block OR a pointer to a list of blocks. Now the problem is to find where in the dblock files are these blocks. This is solved by the indexes files, who provide the map of blocks (the 102 K elemental units) to dblocks (the 50 Mb files). Duplicati needs to download dblocks when it can’t find the available data for some files. If all indexes files are healthy, there is no need to download blocks and Duplicati don’t download any in this case.
So downloading blocks happens when the backend is damaged. Now why is the backend is damaged can happen for many reasons: bad hardware, networks glitches, client computer crash, bugs, and (maybe the baddest of all problems) operator errors such as misconfigurations, mixing databases or manual cleaning of the backend.

Basically filtering the only thing that can be done at Duplicati level - fixing bugs - is extremely difficult based on this forum postings, that are most often half baked. So the real good thing to do is to remedy, that is, make the time to recover less painful.

if the crash is happening because the application crashes ‘cleanly’, that is, abnormally ends but exit according to the operating system rules, yes. If the task is killed by say something like kill -9 under Linux, or the operating system itself abends, it’s very doubtful that Sqlite can always unroll the transaction. Not to think of the case where the transactions are not programmed correctly of course.

Thank you all for listening. I’ll try to address your questions…

re: error message, here are the specific error messages I’ve seen, drawn from the logs; each is followed by paragraphs of code traces:

“Attempt to write a read-only database”

this one I’m pretty sure is because Timemachine is trying to backup the database, and locks it. Suspending use of Timemachine while running Duplicati has made this one go away.

“The database was attempted repaired, but the repair did not complete. This database may be incomplete and the backup process cannot continue. You may delete the local database and attempt to repair it again.”

this one appears when I try to backup to or rebuild a database whose previous rebuild was interrupted, usually because I had to reboot my machine or kill Duplicati.

“Some kind of disk I/O error occurred”

this one happens when I lose contact with the source (external) drive because of a USB error (all of the external USB disks disconnecting simultaneously) or Thunderbolt 2 cable being disturbed by me or the cat :frowning: . This should happen less often with the newer USB C connectors, which are less easily disturbed. However, I cannot keep my (two) cats out of my home office.

re: blocksize

years ago I decided to use 1GB blocksize because I have a lot of video files, and because I determined from testing that Duplicati target file (…aes) processing was not so much dependent on blocksize as on the number of files (blocks), so I opted for larger blocks. I now have gigabit fiber-optic internet from my ISP, and am wired directly into the router. I am backing up source folders that are in the 1-6 TB range.

re: hardware, I’m on a brand new (2023) M2 Max MacBook Pro, running macOS Ventura, with 96GB of RAM and 4TB SSD.

this is considerably faster than my previous (2014) Intel I7 MacBook Pro with 16GB RAM and 1TB SSD, but still suffers from overall slowdown on rebuilds lasting more than a day.

re: pause ANY operation

this means that I rebooted, crashed, or killed the mono-sgen64 process in Activity Monitor. My experience with the GUI “Stop Now” or “Stop after current file” features has been that it does not stop Duplicati, at least in the timeframe I’m willing to wait, particularly if my whole system is creeping along. Once Duplicati is killed, performance returns to normal.

re: waiting weeks or months for recreate to hopefully finish

I’ve waited as long as six weeks to complete a recreate, but usually the impact (slowdown) on my system causes something else to break, requiring a reboot, or at least killing the recreate. For example, after my post yesterday, the macOS Finder app stopped responding and could not be relaunched. I left it run overnight, but the next morning the machine was frozen, so I was forced to reboot.

re: the database is so darn delicate

by “corrupted”, I mean that I cannot continue the previous operation, and Duplicati recommends that I rebuild the database. The rebuild then fails, and advises me to recreate the database. Or, I get the infamous count-mismatch error (sorry, I don’t have one to paste, so this is from memory), “…in version ##, found xxxxxxx blocks, expected yyyyyy blocks”, for which the best solution I found (in the fora) is to delete the version and try again, which usually finds a count-mismatch in another version, so I delete the versions until I get a clean run.

re: “Gets … were taking 15 seconds”

looking at the log with the “explicit” filter, I see a message that says (from memory), “the remote GET operation took 00:00:0x.xx.xx”

re: SSD

this is brand new Macbook Pro with a 4TB internal SSD. I keep the Duplicati databases on the internal SSD.

re: Speedtest

Speedtest tells me if there is an issue with my ISP. I’ve recently switched to a fiber-optics based provider, so upload and download have similar speeds in the 900 Mbps range. My previous ISP was cable, with 800Mbps download speeds, but 10-30 Mbps upload, and subject to local network congestion, which lower the transfer speeds.

re: Activity Monitor

CPU, Memory, Disk, and Network loading do not seem to indicate any bottlenecks, and are consistent with normal use of the system

re: Exclude Duplicati files from backup

This is an option, of course, but I’d rather have the Duplicati DBs backed up to Timemachine, as a rule. I could use it as a temporary workaround when doing long-runnning rebuilds, I guess.

re: kill the DB and start over

I have been using Duplicati for years, and am trying to retain my backup history, rather than starting over, but that seems to be unavoidable now.

re: some background on what’s going on here:

Last Spring, I was advised that my GDrive (provided by my University) was going away, and that I should migrate to a personal GDrive. I had to migrate about 20TB of data, most of which is Duplicati blocks. Over the last year, though many iterations and issues with Google Drive support, I’ve managed to get most of my Duplicati target folders moved. One of my workarounds was to download the target folders to a local drive, then upload to the new GDrive. Various errors occurred, and the local DBs have had to be rebuilt multiple times, using either the local or remote copy of the target blocks. Usually a simple rebuild failed for reasons discussed above, and I’ve had to recreate the DB. Of the half-dozen Duplicati backup-sets that I’ve migrated, I’m down to the largest target folder (4TB), which has been uploaded, but getting the DB to work without throwing the aforementioned errors has been problematic. When my machine froze this morning, it had been rebuilding since Feb 11th. Once (last Fall) I was able to rebuild this DB using the local copy of the target blocks in “only” a single six-week stretch without interruptions (on a slower laptop). My ISP customer service told me that I hold the all-time record for data volume in several months.

Again, thank you for your patience and perseverance as I struggle with (and vent about) this experience.

are you talking about this parameter:

–blocksize = 100kb
The block size determines how files are fragmented.

or this one:

–dblock-size = 50mb
This option can change the maximum size of dblock files.

if the former, I am afraid that it could make your block larger than the dblock… Don’t think that this is a normal case and that it could even work, and if it works by some miracle it could be a bigger problem than anything. A block size of 1 Gb seems, well, extremely dubious to me.

If the latter, it will not fix anything about the database queries. The database queries performance depend almost entirely on the block size, not so much on the dblock.

In addition to those, damage can be by software handling files wrong. Those are more controllable.
Damage is also in the eye of the software. In one open issue, an interrupted compact got confused.
It deletes dindex file (as it should), gets interrupted before commit. so next run sees a “missing” file.
This is a fairly harmless nuisance though. I’d like to know if software can currently lose a dindex file.

Another approach in addition to speedup of the worst case might be detecting pending recreate risk. Yesterday I was working on a Python script heading towards sanity checking dblocks versus dindex.
Step after detection might be Duplicati code to recreate any missing dindex files before it’s too late…

No argument with any of that. Just wanting to say this may be difficult, just as fixing bugs is difficult…

How To Corrupt An SQLite Database File

If an application crash, or an operating-system crash, or even a power failure occurs in the middle of a transaction, the partially written transaction should be automatically rolled back the next time the database file is accessed.

SQLite is Transactional

The claim of the previous paragraph is extensively checked in the SQLite regression test suite using a special test harness that simulates the effects on a database file of operating system crashes and power failures.

is SQLite claim anyway. I know some systems must have something in them that goes wrong, as there sometimes are times when SQLite flat out can’t open its database – a step worse than bad data inside.

Its not that its unmaintainable. Its that its a lot and apparently too much. Its many years on things. For a backup application where stability is vital, its too slow on fixing things. The only two ways around that are more time spent fixing things and slimming it down.

If you’re fine with various things being broken for years then its fine. I’m also fine with it as I’m not experiencing any issues atm and the issues I know about I know how to avoid.

But, I’d still instantly axe a bunch of things to make it easier to maintain. I will do that with my own projects where I need to.

Of course, recreate here might be worth it to keep it assuming it receives enough improvements. Its a valid viewpoint to say its necessary. Personally, I’d never ever wait for more than a day on it. I’d find another way of doing things or focus on its performance to the point where something happens that it can be made fast enough to be happy with it.

I would not send PR to fix them if I was fine with it.
It’s not right in this time to have dog performance while recreating DB with a data size of 500 GB. In my tests I have seen performance begin to degenerate with 1.2 M blocks (equivalent to 120 GB of data with standard block size) Having to change block size with 10 TB of data or more could be expected. Many users would consider and ask themselves if backing up 10 TB could need special consideration before starting to configure a backup. Not many will do with 500 GB.

Try not backing up the database. APFS local snapshots are probably invisibly low-level.
Windows NTFS ones are that way at least, but they do cause a brief suspension of I/O.

So situation is just what the message says. It’s not really corruption, just unfinished work.
The way to avoid this is avoid intentional interruption. May be hard to do with it so slow…

Is the database on that drive? If so, don’t do that if the drive is prone to being unplugged.
If the database is on the permanent drive, can source disconnect reliably break test DB?
How exactly is it messaged? Source drive errors usually are caught and just do warning.

Agree with @gpatel-fr question. Maybe you’re thinking of Options Remote volume size.

image

There’s a link on the GUI screen there to click, or direct link is Choosing sizes in Duplicati.
Remote volume size is a simpler-but-still-quite-confusing term for the dblock-size option.

It’s got to be from something. I don’t know macOS much, so can’t give step by step advice.
Doing Google search for troubleshooting macos performance claims 43 million hits though.

This is interesting. For me, it reliably stops after finishing what it’s doing and uploading data.
I’m on Windows. Any one of the Duplicati folks want to see if they can reproduce this issue?

It’s pretty clear some resources are being exhausted, so keep looking for what that could be.
Do you know how to measure the size of processes, such as the mono ones doing Duplicati?
One Linux user was observing memory growth although we don’t know exactly the operation.

A little too vague, though I have some guesses of similar ones.

looks like (on mine) the same as the Started to Completed time I posted. It still leaves open the
question of whether there’s some other work such as decryption that might be pushing time up.

2022-10-17 16:52:39 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationGet]: RemoteOperationGet took 0:00:00:06.617

If that’s true of the external drive, then how a source error damages a DB gets more mysterious,
although the definition of “corrupted” DB being used here isn’t the usual one that one would use.

as does disk congestion, which is why I’ve been asking. There’s a CLI-based test I described too.

Make sure you understand my comment on CPU cores, but if normal use means not straining for
database recreate or something, are you saying that slows things down but all monitors look fine?

If you somehow back it up during backup, it goes instantly obsolete, as it’s changing during backup.
If you back it up while idle and restore a stale copy later, it mismatches and Repair hurts destination.
Database and destination must always match, e.g. you can copy it with Duplicati script after backup.
Configuration data in Duplicati-server.sqlite is less active but usually you Export and save the config.

To soften the impact a little, if space allows and old one is still intact, save it for if you need older files.
Newer one with a better blocksize and who knows what other old problems removed may work better.
Sometimes hidden damage may be possible and reveal itself in a few ways, e.g. dblock downloading.

I can think of an ISP action that would be a lot worse than that. It’s nice that yours wasn’t very upset…