Stuck on "Starting …"

tophee · September 15, 2017, 6:06pm

BTW: A good way of showing your appreciation for a post is to like it: just press the button under the post.

If you asked the original question, you can also mark an answer as the accepted answer which solved your problem using the tick-box button you see under each reply.

All of this also helps the forum software distinguish interesting from less interesting posts when compiling summary emails.

I also edited some of your posts to improve the formating. (Just added ~~~ before and after the output you pasted, see here for details).

NoLooseEnds · September 15, 2017, 6:25pm

I don’t want to wander too much off topic, but I did the button too, and I genuinely say thank you to people who help me. Is that a bad thing here?

Anyhow, thank you (hope that’s okay saying that) for the editing (& tips for next time), I will try to use them from now on.

tophee · September 15, 2017, 8:19pm

I thought I didn’t see any likes but I may have missed them. In that case my hint was of course superfluous. In any case, it was not meant as a criticism of saying thank you. So:

No, not at all. It’s much appreciated! Thank you for participating!

NoLooseEnds · September 15, 2017, 8:34pm

Great!

I had 'ed the few first posts from yesterday, but not todays (until you mentioned it).

So, back to the topic, if anyone has the slightest idea of why “Verifying backend data …” is taking so long on a Ubuntu VM with G Suite/GDrive as backend, all input is greatly appreciated!

JonMikelV · September 15, 2017, 8:44pm

Thanks for the s!

NoLooseEnds, am I correct in thinking it the original “Stuck on starting” issue was actually related to the slow verification times? If so, maybe changing the Topic title would make sense.

It sounds like you’re using a cloud provider for the backend (sorry - kinda skimming here). If that’s the case do you have the time/space to try bring the backup files to a local machine or LAN destination and trying the verification process pointing at that local destination? That would help remove cloud and internet (and potentially LAN) bottlenecks from the scenario.

Alternatively, you could try setting up a pair of smaller test backups to the same cloud provider as your primary AND to somewhere local and seeing if the “verify backend data” step takes about the same time regardless of the “distance” to the destination.

kenkendk · September 16, 2017, 10:57am

This looks like the source of the problems. Duplicati has detected that this file was not uploaded completely for some reason. It is then deleting the remote file and rebuilding it.

Did you kill the backup last it ran, or is there some issue with files going broken on the remote end?

NoLooseEnds · September 18, 2017, 8:20am

@JonMikelV
I do not have the HDD space to do a local backup to another machine on the LAN, but I could try with a smaller backup set.

@kenkendk
The backup was stopped, not killed. But I selected not to wait on the upload to complete. Does the "removing incomplete file … take such long time?

Anyhow. The VM ran out of space due to the logfile and maybe tmp over the weekend, so now I have set the log-file location and TMPDIR variable to an NFS share. So hopefully, this will remove all the space issues.

After setting --check-filetime-only the speed Duplicati to start Verifying backend data …, is much faster. The remaining problems is:

Using a long time (2 hours +) to verify the backend
Pretty slow going through existing files.

I made a video of the web interface when it’s processing files, and would appreciate if you could take a look and tell me if this is “normal” or extremely slow.

kenkendk · September 18, 2017, 9:15am

That operation should only delete the remote file(s) and as such be quite fast. It may take some time if one of the files to delete references local data. In such a case there are some collateral in the database that needs cleaning up.

That looks very slow, but it seems to “stop” so I am assuming it just keeps at it, but the message is the same.

I have uploaded a new canary build: Releases · duplicati/duplicati · GitHub

This new version has a new advanced option --use-block-cache, which can be used to speed up the lookups that are slow in your case. This does require some memory (>300mb of RAM pr. TB of data), but if you have a non-SSD disk, I think it could give some results.

NoLooseEnds · September 18, 2017, 12:22pm

Thank you, I’ll give it a go after the initial backup is complete.

The VM is (unfortunately) not running from SSD. The datapool (that it connects to via internal ESXi network) consists of 6 x 4 TB in raidz2 (1 hot-spare) and a 120 GB SSD cache. It is starting to become a bit old, but it’s still chugging away happily.

JonMikelV · September 18, 2017, 2:06pm

kenkendk, in a “standard” backup report email I see some file stats like DeletedFiles, SizeOfModififiedFiles, etc. but the only time related are EndTime, BeginTime, and Duration.

Are there task specific performance stats available in some way?

I’m thinking of file processing stuff I’ve written that includes stats like time spend reading folder/file lists, time spent on specfic SQL tasks, etc I’ve even worked on some code where I know what “normal” task times look like and only inculde stats in the logs when things are outside that time.

I know there’s potential overhead associated with all this monitoring but it might help in identifying specific slow parts of the code when run in certain situations (aka on other people’s hardware).

dgcom · September 19, 2017, 1:02am

Actually, this would be really great if Duplicati could provide such detailed internal profiling data for all the stages of the operation… Just getting elapsed time will not incur an real overhead…

kenkendk · September 20, 2017, 7:53pm

I like the idea, but it is not implemented currently. There are some general timing measures spread out in the code, but not like you suggest. I think this would be very easy to implement in the concurrent_processing branch, as each “process” there can time its “active” time and report that when it finishes.

NoLooseEnds · October 1, 2017, 7:10pm

So today was the day that the backup completed, I think. I got an I/O error upon completion. Not sure why, but at least I know that it has uploaded a couple of TB of data first.

I updated to the canary build mentioned above by @kenkendk, and added “–use-block-cache” (true) to the config. I also restarted both VMs involved (OmniOS serving the data, and a Ubuntu Server doing the backing up) for good measure.

Anyhow, I started the backup again – and it’s been sitting on “Verifying backend data …” for a few hours. Currently it is consuming 100 % CPU of one of the virtual CPU cores in the VM.

Also:

Logging is set to “Warning”, but I have “profiling” on in my browser – and it seems like the log file is getting the profiling information, because they are growing fast when stuff is happening. I have a 12 GB log file of the previous backup.
the “QZDJRHUBWM.sqlite” file is 5 GB
If I figured out this correctly, it’s 1.982.566 files in the directory that is getting backed up (minus a few that are excluded in the backup)

Is there a way I can try to debug why it is taking so long? Can I provide some specific data? Do a test? etc?

(I have been running this on an alternative system with a similar setup as well, but against a different backend – JottaCloud. A lot less data – a few 100 GB, not multiple TB. But I have had no problems. And it’s a lot faster)

EDIT:
It’s done verifying backend, and now I see a lot of:

Oct 2, 2017 11:32 AM: Checking file for changes /path/to/file.psd, new: True, timestamp changed: True, size changed: False, metadatachanged: False, 1/8/2013 12:30:04 PM vs 1/1/0001 12:00:00 AM

Where it says that the timestamp has changed, but it has not. And it is comparing the timestamp to “1/1/0001 12:00:00 AM”, and that makes Duplicati think that everything is new.

Both the Ubuntu VM (via NFS) and directly on the OmniOS VM says the file date is “Jan 8 2013”. The time/date on both systems are correct.

Does this make sense to any of you?

(Status as of now, reported by Duplicati, is: 273034 files (2.73 TB) to go at 1.06 KB/s – that is about 1/4th of the progressbar)

EDIT2:
I’ve tweaked my NFS mount, so I now get ~450 MB/s write, and around 1GB/s write. I tested the write/read with “dd” following this blog post.

Write:

$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.40283 s, 447 MB/s

Read:

$ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.03813 s, 1.0 GB/s

NFS mount options:

/datapool/Share from 192.168.20.15:/datapool/Share
Flags:	ro,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.20.15,mountvers=3,mountport=47674,mountproto=udp,local_lock=all,addr=192.168.20.15

EDIT 3
After having a few issues with crashing Duplicati service, NFS locking up, etc. I finally completed a successfull backup.

It took quite a while, and it spent most of the time setting timestamps. Duplicati did it in bursts from 10-30 files at a time, then paused a bit (maybe CPU limit?), and continued.
I’m not sure what caused my NFS issues, but I solved with tweaking rsize/wsize and setting the MTU of the internal network to 9000.
The Duplicati crashes are random, not sure what is causing those. But when it works it works.

kenkendk · October 12, 2017, 9:31am

NoLooseEnds:

It’s done verifying backend, and now I see a lot of:

Oct 2, 2017 11:32 AM: Checking file for changes /path/to/file.psd, new: True, timestamp changed: True, size changed: False, metadatachanged: False, 1/8/2013 12:30:04 PM vs 1/1/0001 12:00:00 AM

The key is the “new” part. Duplicati thinks this file is new (it has not seen this path before), which is why it does not have a timestamp. Normally this only happens if the file has been moved or renamed.

Did anything like that happen?

NoLooseEnds · October 12, 2017, 10:56am

No, it’s old files (2013), so they have been in the same structure since then.

Only thing I can think of must be that it did not complete properly due to I/O error (probably due to some NFS issue). That seems to be resolved now, and Duplicati no longer spends time checking “new old files”.

AFAIK everything seems to work fine. I had a few unexplained crashes, but upgraded to the latest canary build, and hope that solves that. Also set the service to restart on crash.

A backup now takes aprox 2 hours with few modified files to upload. Not sure if that is fast/slow.

shushry · October 20, 2017, 6:45am

Since this seems to be the only related topic, I’m having the same issue here. Running latest from Canary, Linux, 2Gb Ram, 2 core CPU.

I’m “stuck on Starting” as well, when anything beyond the first backup tries to run. Running logs in “Profiling” it’s hanging on this query forever:

SELECT “CalcLen”, “Length”, “A”.“BlocksetID”, “File”.“Path” FROM (
SELECT
"A".“ID” AS “BlocksetID”,
IFNULL(“B”.“CalcLen”, 0) AS “CalcLen”,
“A”.“Length"
FROM
"Blockset” A
LEFT OUTER JOIN
(
SELECT
"BlocksetEntry".“BlocksetID”,
SUM(“Block”.“Size”) AS “CalcLen"
FROM
"BlocksetEntry"
LEFT OUTER JOIN
"Block"
ON
"Block”.“ID” = “BlocksetEntry”.“BlockID"
GROUP BY “BlocksetEntry”.“BlocksetID”
) B
ON
"A”.“ID” = “B”.“BlocksetID”

) A, “File” WHERE “A”.“BlocksetID” = “File”.“BlocksetID” AND “A”.“CalcLen” != “A”.“Length”

…I stopped Duplicati and opened up the sqlite db manually, just to see if the query could run from the sqlite command line - and it just hangs there forever - (or at least as long as my patience would last

High CPU during this process, so it looks like this is a pretty long running query -

Any help on what’s going on here?

shushry · October 21, 2017, 6:11pm

Update on the above - I ran the query manually using sqlite against the db, and after 2 days, it’s still “chewing” on it. This is a fairly large backup (36GB email system, lots of files).

JonMikelV · October 21, 2017, 8:22pm

Yikes! Just so we can try to compare to other systems, do you know how many files is “lots”?

shushry · October 23, 2017, 3:33am

Sure - in this particular backup:

FileCount = ~500k
Total Backup Size = ~38GB
Destination = Google Drive
Network B/W = 1Gig in/out

The query, run manually, finally completed after 3 days. At that rate, I’d need to have at least that (so probably 4 or 5 days total) per backup run. Wouldn’t work if I wanted a daily backup!

Whatever query is being run there definitely needs some optimization of some kind.

JonMikelV · October 23, 2017, 1:09pm

I may split this off to another topic to gather some user stats, but for now here’s a first stab at trying to track potentially performance related stats (rounded up):

Src Cnt	Src Size	Backup Size	Versions	Update Duration	Max Bandwidth	Destination	Notes
34K	99GB	166GB	53	25 min	54 Mbps	NAS via SFTP