I am having an issue with one of my backups - large set of photo (1.5TB)
It always gets to “0 bytes to go” but never finishes (I have 4 other backups to the same destination that work fine, they are smaller, around 0.7TB).
I looked at the forum and tried to run the ANALYZE command in the DB, I tried deleting the local DB and repair from the remote data - nothing.
The live log (Profiling) shows this on the latest attempt: Jan 8, 2021 10:23 PM: ExecuteNonQuery: UPDATE “Block” SET “VolumeID” = 58636 WHERE “Hash” = “BYmqH7qGdietKJrIRUENu3nrGTa0PF7/IGg0zskVU+I=” AND “Size” = 102400 AND “VolumeID” = 58634 took 0:00:00:00.000
So we are here 14 hours later and no bueno. Any ideas what else to try?
What is the Storage Type on the Destination screen? Let’s see what retries a finished backup gets:
<job> --> Show log --> (pick) --> Complete log --> BackendStatistics --> RetryAttempts should be low.
Is this Windows with the basic install from .msi without service or special Administrators privilege use?
If so, PowerShell runs of ls $env:LOCALAPPDATA\Temp\dup-* | sort -Property LastWriteTime would let you keep an eye on how many (default remote volume size) 50 MB volumes have queued up. Default limit is 4, and at that time processing of additional ones can stop until upload congestion eases.
You can see uploads at About --> Show log --> Live --> Retry and look for progress, retries, and issues.
could be related to some of the temporary files from around that time, such as these:
It’s a guess what files are, but ones near 50 MiB (52428800) are likely future dblock files.
The last few may be smaller. Near end of backup, change data ends and gets collected.
There can be parallel processes early on, and a SpillCollectorProcess after those finish.
Leftover blocks are assigned to a different volume, e.g. your log logged 58634 to 58636. Backup hangs at the end of Backup_ProcessingFiles points to code I think is doing that.
It’s too bad the live log doesn’t show seconds (unlike the log-file version) to clarify order.
For example, are the short files the leftovers being collected or results of the collection?
One thing you could do is to try opening the last few temporary files with a .zip program.
A dblock file will have small (default 100 KiB) files, each a block, and named by its hash.
Another is to see if SFTP server has any logs. If not logs, files should have some dates.
Duplicati not doing retries means it’s not seeing errors, but fast flow of data is unknown.
The Duplicati log would also show data, but the server might be a nice one-stop history.
I tried to open few of the dup files but getting an error: can’t open an archive. I tried 7zip and WinZip. I think this is because they are encrypted. Should I try the backup without encryption?
From one of the files when opened via text: AES !CREATED_BY SharpAESCrypt v188.8.131.52
The SFTP server does have logs, here is what I see as a last entry:
2021-01-09 18:14:36,691  DEBUG SolarWinds.Sftp.Service.Listener - User XYZ from IP 192.168.2.2 uploading file to “/Photos/duplicati-i0cf354d6f78e47d69eefeaabe534a8b5.dindex.zip.aes”
The way I stop the stuck backup is within the web GUI I click cancel and then I restart the service - should I do it differently?
I am happy to take any troubleshooting steps, just need hints on what to look for / try.
Need this fix - the only change here was the size backup grew in Dec by about 400 GBs.
I think some of them are encrypted, and some aren’t. Usually I find unencrypted. This might be significant as to where in the process things got stuck, so probably the best next thing is to try decrypting a file copy.
AES Crypt is easiest way, but if you prefer Duplicati CLI, there’s a SharpAESCrypt.exe in its install folder.
Does it show “uploaded” at end? You could also check for partial files by decrypting. Truncated files won’t.
There’s unfortunately no nice way to stop when stuck. When not stuck, “Stop after current file” is the best.
This is much appreciated. Problems that happen outside of developer systems really need help from user.
Developers rely (I think) on detailed logging, the heavier the better. You’ve seen how noisy profiling can get, but that was the clue that led me to a guess of where it was at its last profiling log output. To figure out how it stopped on a repeat run (provided the problem holds) might not need quite so heavy a log output. Even at Information level one sees messages like
Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-ia48f05ebb38a4f279f3e5f6ea68e06be.dindex.zip.aes (1.79 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ia48f05ebb38a4f279f3e5f6ea68e06be.dindex.zip.aes (1.79 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-20210112T134000Z.dlist.zip.aes (41.58 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Completed: duplicati-ba62726bbbd0f4ff4ac1f76cebe8e9654.dblock.zip.aes (770.62 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-20210112T134000Z.dlist.zip.aes (41.58 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ba62726bbbd0f4ff4ac1f76cebe8e9654.dblock.zip.aes (770.62 KB)
Jan 12, 2021 8:43 AM: Backend event: Put - Started: duplicati-ba9c93961b78c42e8b2764131e4e78d33.dblock.zip.aes (49.95 MB)
as a live log, reverse chronological, whereas log-file is chronological and a little nicer. Transfers overlap by
--asynchronous-concurrent-upload-limit (Integer): The number of concurrent
When performing asynchronous uploads, the maximum number of concurrent
uploads allowed. Set to zero to disable the limit.
* default value: 4
which you could cut to 1 to simplify the view. Or you could do the server log and file checks first to see if an upload hang possibly is what’s clogging up the processing pipeline. I’m not sure what other markers to look for. Ideally there would be some signs that don’t require going all the way to profiling to get some other clue.
There are possibly some developers or other experts who have ideas. If so, feel free to chime in with ideas.
I don’t know if these are still where it sits days later, but whenever it’s hanging, you can try fuser on them.
That would be one way of maybe knowing if Duplicati is still processing them, or if they’re just in a queue.
I’ve been looking at the file flow for a few days, and I think asynchronous-upload-folder would be a way to have original unencrypted .zip files in a folder of your choosing. Encrypted would still upload from Temp.
This split of temp files might make it a little easier to understand the flows and where things are stopping.
There’s a chance that a netstat command, maybe netstat -an | findstr :22 might catch ssh state, however there are still some open questions on the server log and whether last few files uploaded fully…
Unfortunately I didn’t find direct information on SpillCollectorProcess activity. It can sort of be inferred from profiling output, such as the line in your original post. Portions of a big log and a DB bug report would be a path, but there are some other test proposals still outstanding. Sorry it’s difficult to know where it stopped.