Slow DB Recreation On OneDrive

Hiya,

I have a backup task of around ~2TB that is to an external hard drive that is okay (I don’t think this is relevant but including it just in case).
On the same Duplicati server I have a backup task of ~500GB using the OneDrive2 API for authentication.

Recently I had issues with the ~500GB OneDrive2 task so I used the “Delete and Recreate” option on the DB.
However, the DB recreation is running very slowly (~4 minutes each file).
There are ~25000 files on OneDrive which means it will take approximately 70 uninterupted days to complete this task (the actual backup originaly took less than 7 days to complete).

My internet speed is ~100Mb down and ~5Mb up. So I thought it could be that there is some sort of bottleneck with the upload speed.
However I checked the logs the other day and as you can see from the below that a “Get” event starts and within the same minute (live monitoring shows it happens almost imediatly) the “Get” event completes:
Oct 15, 2019 4:28 PM: Backend event: Get - Completed: duplicati-b4b4bbc1bea744e419f46ee10d762861b.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:28 PM: Backend event: Get - Started: duplicati-b4b4bbc1bea744e419f46ee10d762861b.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:23 PM: Backend event: Get - Completed: duplicati-b4b03c48f641d461691dc96f52f0e430b.dblock.zip.aes (49.91 MB)
Oct 15, 2019 4:23 PM: Backend event: Get - Started: duplicati-b4b03c48f641d461691dc96f52f0e430b.dblock.zip.aes (49.91 MB)
Oct 15, 2019 4:19 PM: Backend event: Get - Completed: duplicati-b4b02e70b947b409b916a9f1fa3d5b99c.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:19 PM: Backend event: Get - Started: duplicati-b4b02e70b947b409b916a9f1fa3d5b99c.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:15 PM: Backend event: Get - Completed: duplicati-b4b1c23109e4e4d66b7a4450c68627754.dblock.zip.aes (49.98 MB)
Oct 15, 2019 4:15 PM: Backend event: Get - Started: duplicati-b4b1c23109e4e4d66b7a4450c68627754.dblock.zip.aes (49.98 MB)
Oct 15, 2019 4:10 PM: Backend event: Get - Completed: duplicati-b4b00f3b8af9549c0a7d563f87f76862c.dblock.zip.aes (49.92 MB)
Oct 15, 2019 4:10 PM: Backend event: Get - Started: duplicati-b4b00f3b8af9549c0a7d563f87f76862c.dblock.zip.aes (49.92 MB)
Oct 15, 2019 4:05 PM: Backend event: Get - Completed: duplicati-b4b0c7b1673da4ae2a3a91edfb79fc915.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:05 PM: Backend event: Get - Started: duplicati-b4b0c7b1673da4ae2a3a91edfb79fc915.dblock.zip.aes (49.99 MB)
Oct 15, 2019 4:01 PM: Backend event: Get - Completed: duplicati-b4b0b7165ea354dedaceaece6860f9919.dblock.zip.aes (49.95 MB)
Oct 15, 2019 4:00 PM: Backend event: Get - Started: duplicati-b4b0b7165ea354dedaceaece6860f9919.dblock.zip.aes (49.95 MB)
Oct 15, 2019 3:56 PM: Backend event: Get - Completed: duplicati-b04af651365b0476f9e8a767cf1aff508.dblock.zip.aes (49.98 MB)

Is it possible to speed the recreation task up? As from a lamens point of view it seems to spend a lot of time doing nothing (although in reality its probably doing something that I am not aware of).

Thanks. :slight_smile:

If the progress bar is past 70%, it’s probably trying to search all dblocks for information that’s missing. There is a bug in versions before v2.0.4.18-2.0.4.18_canary_2019-05-12 where an empty source file causes searching erroneously. What Duplicati version are you running? Unfortunately, there’s no fixed beta release yet, and even the canary channel has some substantial bugs now (hoping for a fix soon).

Is your main need to do a restore? Everything or a little? Or is the main need to continue with backup?

Yea the progress bar is over 90% if anything.

Version is:
2.0.4.23_beta_2019-07-14

My main need is to continue with the backup.

Thanks.

There’s a bit of a problem about which Duplicati to use because v2.0.4.16-2.0.4.16_canary_2019-03-28 added a regression that damages backups. This was fixed recently and should be OK in today’s canary v2.0.4.31-2.0.4.31_canary_2019-10-19 (thanks to all who helped!) whose main issue is it’s really new…

I take it that something happened to your old database that led to the Recreate. Waiting 70 days doesn’t sound very attractive, although I think the math might be off – about 50% of the files are tiny dindex files. What happens when all is working right is only those and some dlist files are enough to do the Recreate.

Do you have another computer that isn’t now being backed up by Duplicati? You could let it leisurely run the Recreate while starting a new backup so you have a backup 7 days later based on prior experience.

You could instead let it run the Recreate on 2.0.4.31 canary to see how fast it goes. It “should” be faster. Problem with this method is that the 2.0.4.31 database format is newer, so 2.0.4.23 beta can’t handle it. Basically, it’s a forward path, and there might be an Experimental then a Beta sometime soon. I hope…

Thanks for your help.

Just to update, I have deleted the backups and am starting again.

@SilvaNights

The “doing nothing” step is usually the case where single thread is maxed out. So depending what kind of status you’ll use to monitor what the system is doing, it might look like doing nothing. Yet it’s working on “full speed”. With larger data sets like yours, that can take a week or longer. Also the CPU performance naturally affects that a lot. - At least that’s my personal experience in several different environments. With latest canary with Windows & Linux platforms.

Yea I thought this could be the case initally before remoting the server, but its running on a server with a 1650 V2 Xeon (12 threads), 32GB RAM, DB’s sitting on SSD’s.
The system shows no activity and neither does the local router to it.

What does this mean and, how does it relate to the topic? If this server where Duplicati is running the latest Canary (if not, what), with router just monitoring whether it’s talking to OneDrive, and the server looking really idle on all Task Manager Performance tab metrics? If so, that does seem rather strange.

You can get a closer look at Duplicati with About --> Show log --> Live --> Profiling

You can get an even closer look with Sysinternals Process Monitor

The router look is just a high-traffic look, right? You can certainly get a close peek with packet capture.

The post above you will see that someone has highlighted that this issue is usually due to a single thread normally being maxed out, I was explaining this was not the case.
The local router shows how much traffic congestion is on that network segment (as other devices can be maxing out upload).

The original provided logs was pulled from the live logs on Duplicati, thanks.

I had a look with procmon and couldnt see much activity between its “Gets”.

I might given WireShark a run when I get a chance, but at the moment I have fired up Duplicati on another server and its doing a duplicate backup to the original one I had issues with.

The current plan is to wipe the original backup once this has completed as I want to move Duplicati away from that server anyway so I will have to just take the hit on rerunning the backups anyway (as there are understandably some issues with moving backup sets from Windows to Linux).

I just today had to rebuild db with JottaCloud and in my case slowest part of the recreation was getting all the indexfiles. High number of small files and the API latency between requests. Probably nothing much Duplicati can do about that, unless requests are parallelized.

I’m also using Duplicati with OneDrive but indirectly, where I rclone stuff to OneDrive after first updating the backup set(s) locally. I’ve noticed that they’ll easily trigger ratelimits, if trying to transfer small files using high number of parallel connections. Which of course will make the process slower.

Ref: Mono update (JottaCloud TLS issue) -> Database rebuild - Why?