Duplicati database rebuild - 2gb client database for 750gb of backup files

JonathanM · January 10, 2019, 1:20am

Hi Everyone,

Have been using duplicati for awhile and for different purposes but always it’s achilles heel has always been the client side database requirement for correct operation.

My desktops current stats are:
Source: 83.44 GB (but has been as large as 520gb)
Backup: 766.26 GB / 17 Versions
Files: ~72666 (varies)
Database: 2GB
Options: auto-vacuum, auto-cleanup

I was concerned that if I needed to rebuild the database in the event it’s corrupted or had to reinstall, I wanted to validate how duplicati would handle this (as I’m worried about the viability of using it for windows backups).

The rebuild was aborted after 10+ hours, with 4.5gb of data transferred from the origin (only about 900mb of index and list data) with my ssd spending most of the time between 50mb/s and 200mb/s (samsung 850 pro 1tb).

I’m in the process of replicating an environment where I can leave it rebuilding until complete and get a better picture of what that would mean.

However something’s very wrong with database rebuild at the moment (possibly just with my scenario but it’s happened on other instances) as I can see it’s:

cpu
disk throughput
transfer from the source

Are much larger than I would have expected for the relatively simple scenario I have been utilizing it in.

I see others have had issues similar issues, are there any tricks to help avoid a massive rebuild time ?
Have any of the recent thread improvements also brought any tricks ?

Thoughts ?

PS

The backup was working like clockwork until I decided to delete the existing db and rebuild it.

JonathanM · January 15, 2019, 5:32am

Well the rebuild finished taking 4 days 02 hours and 33 minutes.
In the end it transferred 13.7gb from the source folder.

Both of the above where not what I was expecting and would be a show stopper if I actually needed the data restored from a backup in the event the disk itself fails completely (or the backup database is corrupted and unusable).

johnvk · January 15, 2019, 3:37pm

I’d like to highlight something you said, cuz I didn’t realize it til you said it.

If, say, your local disk fails, and you want to restore from (Duplicati) backup, then, your local .squlite db is also gone, and Duplicati will have to rebuild that db from remote files.

And if that rebuild takes 4 days, then it is 4 days until you can get your files restored.

Compare that to a straight copy of, say, 100GB over a slow network connection of, say, 5MB/s. That would take 20K seconds or about 5 hours.

So, again, to be clear, a local drive failure requires a database rebuild/repair, which currently take a long time.

johnvk · January 15, 2019, 4:57pm

Jumping into this topic for my own related question:

I am repairing because of an issue reported here
My repair has been runing for 12 hours.
I left duplicati gui running in chrome overnight with live log on, profile level.
I think chrome ate up 12 GB of virtual memory. I am not sure cuz it crashed and memory usage fell from 14GB to 2GB so I didnt actually see it. Something else may have also crashed releasing memory.
So Windows ran out of memory and things are acting strangely and I would normally reboot.
But, the repair is running…

This backup has 22GB source files and 13GB remote store.
Remote store has 2 or 3 versions only.

The green progress bar in the Gui has been at around 90% for 10 of the last 12 hours.
Here is an excerpt from the live log, profile level.
Can anyone tell how close it is to finishing? I assume it’s doing useful work because others have had their repair take days, but it might be nice to check that.

Enclosed: .excerpt from live log file, a zipped .txt file (5.4 KB)

ts678 · January 15, 2019, 7:19pm

Restoring files if your Duplicati installation is lost is likely a faster path because it only builds what it calls a “partial temporary database”, presumably tailored to the request, but not worrying about all versions of all files. Database recreate is a known slow spot. There are some timing measurements going on right now…

Database recreate performance

although I’m not sure where it will head. A rewrite of some of this area has also begun, but I’m not sure if it may bring increased effectiveness in actually getting issues fixed, increased performance, or maybe both.

I wonder if Profiling log info builds up? If so, possibly using –log-file and –log-file-log-level would avoid this.

Aside from download time (which is somewhat unavoidable, but there might also be issues hanging around that cause dblock files to be downloaded when actually only smaller dlist and dindex files are required), the lengthy operations I found were all SQL INSERT operations. The Block and BlocksetEntry tables track quite small (by default) 100KB deduplication blocks. Choosing sizes in Duplicati covers the settings and tradeoff. I’ve got no specific values to suggest, based on testing, but at least one user seems to have experimented.

There are other ugly answers. Some people back up the database in a secondary backup (seems perhaps overkill for 22GB source, but people with TBs of source might want it). Some decide to start a fresh backup. That’s what I did once, but I don’t really care about old versions. Other people really want to preserve them.

I think situation is not what it should be (so for now things above may be needed) but it’s not being ignored.

ts678 · January 15, 2019, 7:27pm

This is also an option sometimes suggested for a hybrid approach, where the main need is to get restored, Maybe the next need is to get backups going again, then after that one works on complete return-to-usual.

Sometimes increasing things like --blocksize can help (as mentioned earlier). Sometimes keeping versions trimmed through retention policy can help. Then there were the ugly solutions such as a database backup.

Wim_Jansen · January 15, 2019, 9:14pm

If you see it’s downloading dblock files, you could experiment as suggested here. Only do this after taking backup of your backup and database. No guarantees. And it takes a very (very) long time to recreate the dindex files.