Backup stops towards the end

tomasv · January 9, 2021, 8:59pm

I am having an issue with one of my backups - large set of photo (1.5TB)

It always gets to “0 bytes to go” but never finishes (I have 4 other backups to the same destination that work fine, they are smaller, around 0.7TB).

I looked at the forum and tried to run the ANALYZE command in the DB, I tried deleting the local DB and repair from the remote data - nothing.

The live log (Profiling) shows this on the latest attempt:
Jan 8, 2021 10:23 PM: ExecuteNonQuery: UPDATE “Block” SET “VolumeID” = 58636 WHERE “Hash” = “BYmqH7qGdietKJrIRUENu3nrGTa0PF7/IGg0zskVU+I=” AND “Size” = 102400 AND “VolumeID” = 58634 took 0:00:00:00.000

So we are here 14 hours later and no bueno. Any ideas what else to try?

Thx!

tomasv · January 10, 2021, 3:27pm

Another try, another failure, this time at about 2/3rd. Last log entry 8 hours ago:

Jan 10, 2021 1:07 AM: CommitAddBlockToOutputFlush took 0:00:00:00.209
Jan 10, 2021 1:07 AM: Starting - CommitAddBlockToOutputFlush

ts678 · January 10, 2021, 11:05pm

What is the Storage Type on the Destination screen? Let’s see what retries a finished backup gets:

<job> → Show log → (pick) → Complete log → BackendStatistics → RetryAttempts should be low.

Is this Windows with the basic install from .msi without service or special Administrators privilege use?

If so, PowerShell runs of ls $env:LOCALAPPDATA\Temp\dup-* | sort -Property LastWriteTime would let you keep an eye on how many (default remote volume size) 50 MB volumes have queued up. Default limit is 4, and at that time processing of additional ones can stop until upload congestion eases.

You can see uploads at About → Show log → Live → Retry and look for progress, retries, and issues.

tomasv · January 11, 2021, 3:13am

Thank you - Retry is 0 in both locations

The queue is quite longer with significant time between each record - the remote storage is SFTP (but it’s on a local, 1GB network at this moment:

Mode LastWriteTime Length Name

-a---- 1/6/2021 3:07 PM 10894023 dup-1b1b22e6-56ea-4111-94a3-a4ba8e054668
-a---- 1/7/2021 3:47 AM 2254697 dup-a2083fc5-9215-4e20-b834-7b3b3900801d
-a---- 1/7/2021 3:36 PM 11143506 dup-b427634a-fcf3-4bb7-ad0c-53a86d7ff7d5
-a---- 1/7/2021 11:05 PM 0 dup-fcc1170b-a931-4af4-a213-3bee77bfbdb8
-a---- 1/7/2021 11:05 PM 156 dup-b67c3597-4902-4c63-8d2f-9f030555dd1e
-a---- 1/8/2021 2:49 PM 156 dup-4824ec68-efc8-4681-a671-4f1372f92f36
-a---- 1/8/2021 2:49 PM 0 dup-d09b569d-1318-4446-a754-7c19595f4d8b
-a---- 1/8/2021 7:00 PM 52426733 dup-45d2db26-b098-4313-8977-dc7ad0381b58
-a---- 1/8/2021 8:27 PM 52326925 dup-0d336e97-e95a-4b6d-9193-6cfbb428c18b
-a---- 1/8/2021 10:22 PM 46492667 dup-42601c6a-a242-4b93-be75-2432f47c7fe1
-a---- 1/8/2021 10:23 PM 52351629 dup-557aea04-4f79-46e2-bfdf-a13a23235bf7
-a---- 1/8/2021 10:23 PM 2334952 dup-38924cc9-e6d1-48ad-9691-824132ce7163
-a---- 1/8/2021 10:23 PM 2334893 dup-0fc55923-f81f-415b-8d6f-0059dcd57484
-a---- 1/9/2021 4:07 PM 156 dup-4d920842-bcb3-4049-aa3f-9f9de2527cd8
-a---- 1/9/2021 4:07 PM 0 dup-3d5bbb96-8a6b-472d-a6bd-45b8c64928fe
-a---- 1/9/2021 4:57 PM 52371597 dup-104775da-00cf-4e5b-9669-306297f66df3
-a---- 1/10/2021 12:03 AM 52326749 dup-c31dd01a-0266-4974-a9ca-f225d7c430aa
-a---- 1/10/2021 1:01 AM 52373885 dup-5a636b68-68d4-4cae-9df4-f60b4a40550c
-a---- 1/10/2021 1:07 AM 52340269 dup-4e20b105-4bf7-4b3d-a2a0-622673cbf776

ts678 · January 11, 2021, 6:22pm

could be related to some of the temporary files from around that time, such as these:

It’s a guess what files are, but ones near 50 MiB (52428800) are likely future dblock files.
The last few may be smaller. Near end of backup, change data ends and gets collected.
There can be parallel processes early on, and a SpillCollectorProcess after those finish.
Leftover blocks are assigned to a different volume, e.g. your log logged 58634 to 58636.
Backup hangs at the end of Backup_ProcessingFiles points to code I think is doing that.

It’s too bad the live log doesn’t show seconds (unlike the log-file version) to clarify order.
For example, are the short files the leftovers being collected or results of the collection?

One thing you could do is to try opening the last few temporary files with a .zip program.
A dblock file will have small (default 100 KiB) files, each a block, and named by its hash.

How the backup process works explains these rather low-level (sorry) details I’m giving.

Another is to see if SFTP server has any logs. If not logs, files should have some dates.
Duplicati not doing retries means it’s not seeing errors, but fast flow of data is unknown.
The Duplicati log would also show data, but the server might be a nice one-stop history.

DB Browser for SQLite to look in database is an option. It’s just tables, and I’d guide you.
Creating a bug report can be done, but not in middle of backup. How do you stop these?

Ideal is easily reproducible steps that developers can dig into, but that’s often hard to get.
Was this backup always a problem, or did it start at some point. Any clues besides size?

We could attempt to work around it by changing things, but it’d be even nicer to look well.
Depends a lot on what you’re up for, but solutions are difficult without someone looking…

tomasv · January 12, 2021, 3:14am

I tried to open few of the dup files but getting an error: can’t open an archive. I tried 7zip and WinZip. I think this is because they are encrypted. Should I try the backup without encryption?

From one of the files when opened via text: AES !CREATED_BY SharpAESCrypt v1.3.3.0

The SFTP server does have logs, here is what I see as a last entry:
2021-01-09 18:14:36,691 [9] DEBUG SolarWinds.Sftp.Service.Listener - User XYZ from IP 192.168.2.2 uploading file to “/Photos/duplicati-i0cf354d6f78e47d69eefeaabe534a8b5.dindex.zip.aes”

The way I stop the stuck backup is within the web GUI I click cancel and then I restart the service - should I do it differently?

I am happy to take any troubleshooting steps, just need hints on what to look for / try.

Need this fix - the only change here was the size backup grew in Dec by about 400 GBs.

ts678 · January 12, 2021, 1:54pm

I think some of them are encrypted, and some aren’t. Usually I find unencrypted. This might be significant as to where in the process things got stuck, so probably the best next thing is to try decrypting a file copy.

AES Crypt is easiest way, but if you prefer Duplicati CLI, there’s a SharpAESCrypt.exe in its install folder.

Does it show “uploaded” at end? You could also check for partial files by decrypting. Truncated files won’t.

There’s unfortunately no nice way to stop when stuck. When not stuck, “Stop after current file” is the best.

This is much appreciated. Problems that happen outside of developer systems really need help from user.

Developers rely (I think) on detailed logging, the heavier the better. You’ve seen how noisy profiling can get, but that was the clue that led me to a guess of where it was at its last profiling log output. To figure out how it stopped on a repeat run (provided the problem holds) might not need quite so heavy a log output. Even at Information level one sees messages like

Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-ia48f05ebb38a4f279f3e5f6ea68e06be.dindex.zip.aes (1.79 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ia48f05ebb38a4f279f3e5f6ea68e06be.dindex.zip.aes (1.79 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-20210112T134000Z.dlist.zip.aes (41.58 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-ba62726bbbd0f4ff4ac1f76cebe8e9654.dblock.zip.aes (770.62 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-20210112T134000Z.dlist.zip.aes (41.58 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ba62726bbbd0f4ff4ac1f76cebe8e9654.dblock.zip.aes (770.62 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ba9c93961b78c42e8b2764131e4e78d33.dblock.zip.aes (49.95 MB)

as a live log, reverse chronological, whereas log-file is chronological and a little nicer. Transfers overlap by

  --asynchronous-concurrent-upload-limit (Integer): The number of concurrent
    uploads allowed
    When performing asynchronous uploads, the maximum number of concurrent
    uploads allowed. Set to zero to disable the limit.
    * default value: 4

which you could cut to 1 to simplify the view. Or you could do the server log and file checks first to see if an upload hang possibly is what’s clogging up the processing pipeline. I’m not sure what other markers to look for. Ideally there would be some signs that don’t require going all the way to profiling to get some other clue.

There are possibly some developers or other experts who have ideas. If so, feel free to chime in with ideas.

ts678 · January 15, 2021, 7:52pm

I don’t know if these are still where it sits days later, but whenever it’s hanging, you can try fuser on them.
That would be one way of maybe knowing if Duplicati is still processing them, or if they’re just in a queue.

I’ve been looking at the file flow for a few days, and I think asynchronous-upload-folder would be a way to have original unencrypted .zip files in a folder of your choosing. Encrypted would still upload from Temp.

This split of temp files might make it a little easier to understand the flows and where things are stopping.

There’s a chance that a netstat command, maybe netstat -an | findstr :22 might catch ssh state, however there are still some open questions on the server log and whether last few files uploaded fully…

Unfortunately I didn’t find direct information on SpillCollectorProcess activity. It can sort of be inferred from profiling output, such as the line in your original post. Portions of a big log and a DB bug report would be a path, but there are some other test proposals still outstanding. Sorry it’s difficult to know where it stopped.

tomasv · January 16, 2021, 9:27pm

Thanks for your help and suggestions!

I was able to get the backup working via the following steps: (just in case someone else is running to something similar)

I could not decrypt some of the temp files - always got an error so I delete EVERYTHING in that folder
I also deleted the backup job, including the database data but I left the files on the remote storage (SFTP)
I imported the backup job from a previously saved config files, including the “save metadata” option
The import triggered “DB failed error” and I selected “rebuild”
I ran the job manually and was able to finish
Since then, there were 2 automatically scheduled jobs and I added new files prior to each, the job is running fine now

I am hoping it’s going to stay this way - got a question about remote sizes, but will create a new thread for that. Again thanks for the help!