Yet another Duplicati stall

Google says: DeviceIoControl. Duplicati uses this kind of API only when USN is enabled.

this is the real time antivirus. Maybe try to disable it for this directory.

Morning. New testing done, although it was done so early in the morning I forgot to record results. If this data would be actually useful, Im sure I can re-run and re-record the tests.

I ran Duplicati without the Snapshot Policy just to get back to the original conditions. Ran with and without Windows Defender running, as mentioned before. It still stalls.

I did close the stalled Duplicati trough End Task at the end of the day. The computer entered standby instead of shutting down.

This morning I ran a backup operation at my Desktop folder just to see the results, and it didnt stall. I almost ran of HDD space though, so I cancelled it when it was 80% done. I have run it again later, and it worked perfectly.

I dont know what else I can do to diagnose the original Documents folder problem, though. At least, not without putting a debugger on Duplicati, as suggested before, and I dont know what would be useful on that front.

I guess you can put the big Process Monitor file somewhere, and PM a link. Or look at it yourself. Tomorrow, my schedule is a little unknown, plus we’re at the point where the study may get deep.

One question I have is whether Duplicati stopped looking at files. We know it kept enumerating…
Verbose log showed that, but either it found nothing changed later (any forecast?), or didn’t look.

Silence isn’t too informative, but Process Monitor should be nice and loud. A look involves things
such as timestamp, and metadata such as security attributes. My re-backup without change said:

2:41:20.7175998 PM Duplicati.GUI.TrayIcon.exe 34156 31100 QuerySecurityFile C:\tree\000000\000000\000005\000008\000002\000005.txt SUCCESS Information: Owner, Group, DACL

which might be getting the security attributes from NTFS. There were a lot of queries looking like:

2:41:20.7141987 PM Duplicati.GUI.TrayIcon.exe 34156 143180 QueryNetworkOpenInformationFile C:\tree\000000\000000\000005\000008\000002\000005.txt SUCCESS CreationTime: 3/10/2021 12:25:26 PM, LastAccessTime: 3/10/2021 12:25:26 PM, LastWriteTime: 3/10/2021 12:25:26 PM, ChangeTime: 3/10/2021 12:25:26 PM, AllocationSize: 0, EndOfFile: 0, FileAttributes: A

but this is below where Duplicati devs usually probe. It’s below .NET Framework, deep in Windows.

If you want to look, you don’t need to know the detail either, just look for names seen in Verbose log.
Above was from a Process Monitor filter like the below, but you can also just scroll and study names.

Filter approach went well with my profiling log, which is another long-output option we could attempt:

^2023-12-12.*C:\\tree\\000000\\000000\\000005\\000008\\000002\\000005.txt

regular expression in log viewer (glogg in this case, but there are other ways to view files), said this:

2023-12-12 14:41:16 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\tree\000000\000000\000005\000008\000002\000005.txt
2023-12-12 14:41:20 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\tree\000000\000000\000005\000008\000002\000005.txt
2023-12-12 14:41:20 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-SkipCheckNoMetadataChange]: Skipped checking file, because no metadata was updated C:\tree\000000\000000\000005\000008\000002\000005.txt

Channel Pipeline is a developer-level view of the work, but you can see in timings that enumeration is getting a few seconds ahead of the deeper peek. It’s also a little strange to see file enumerated twice.
This is just a random file I got out of a re-backup that would look through all files and find no changes:

For your backup, the verbose log shows files after the seeming stall, but are there changes in those?

If it weren’t for the count not going down, staying on the last file with changes is quite a normal “view”.
My backup stays still like that all the time, but its count goes down while it’s looking for next change…

Tomorrow Ill spend a good chunk of the day onsite, so no deep exploration on backup mechanics from me.

Checking Process Manager, this is the last time Duplicati accesses any files.

TIME 19:56:26.7453141
PROCESS Duplicati.GUI.TrayIcon.exe
PID 20928
OPERATION CloseFile
PATH \Device\HarddiskVolumeShadowCopy3\Users\Kobayen\Documents(Book Pile)\3DMax Literature\index.html
RESULT SUCCESS

You’ll notice this is the file it stops while enumerating on the Verbose log.

After that, it records a bunch of operations related to C:\Users\Kobayen\AppData\Local\Duplicati\Duplicati-server.sqlite, mostly LockFile, UnlockFile, CreateFile, ReadFile and UnlockFile.

After that, its a bunch of registry checking and writing.
And then, stall.

What of the above does it do before its Close? I showed two specific items.
Knowing that may possibly add some clues to what’s still moving – or not…

I didn’t post all of (remarkably long) sequence before, but it looked like this:

and if your last file looked anything like that, then maybe time and metadata got checked in all the files. That begs the question of why the count didn’t go down, but it will probably need additional code study.

Sorry, Ill try to answer everything I think I understood.

whether Duplicati stopped looking
Files after stall

As far as I know, the only files that were modified in that folder were Kaiju and the Marvel savegames, they show in the Verbose log as Processed. Duplicati keeps going enumerating other files after processing them, so I assume it stalls during enumeration?

The entry I posted before is the last file to be enumerated, and according to the progress bar before, there were still files to be enumerated after that, somehow? This is also on the edge of my understanding and English ability, I dont know how else I can answer your question or how to interpret the data sent.

It’s hard to say when everything is enumerated, as I don’t have source folders to check, but

agrees with the end of the Verbose log except it might have been afternoon run

2023-12-11 12:56:38 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\Users\Kobayen\Documents(Book Pile)\3DMax Literature\index.html

and the Process Monitor file (thanks for the info) looks like this (last two events):

10:56:38.7585853 AM Duplicati.GUI.TrayIcon.exe 15624 18916 QueryNetworkOpenInformationFile \Device\HarddiskVolumeShadowCopy2\Users\Kobayen\Documents(Book Pile)\3DMax Literature\index.html SUCCESS CreationTime: 7/12/2017 5:04:33 PM, LastAccessTime: 9/26/2023 2:24:37 AM, LastWriteTime: 7/4/2008 2:05:12 PM, ChangeTime: 8/22/2019 11:54:13 PM, AllocationSize: 4096, EndOfFile: 2178, FileAttributes: A

10:56:38.7585902 AM Duplicati.GUI.TrayIcon.exe 15624 18916 CloseFile \Device\HarddiskVolumeShadowCopy2\Users\Kobayen\Documents(Book Pile)\3DMax Literature\index.html SUCCESS

The CloseFile isn’t too interesting, but the QueryNetworkOpenInformationFile is what possibly is timestamp check (except it’s done many times, and I’m not sure which one Duplicati actually looks at).

What’s missing is what I suspect is the metadata collection. Search for final QuerySecurityFile saw:

12/11/2023 10:54:44.5357326 AM

access. I don’t know if caches in Windows might have satisfied the rest though. Not a Windows expert.

What’s definitely interesting is at the start where it starts reading files in the same tree twice as if it was somehow requested that way (although what’s odd is that some of the guards against that didn’t react).

2023-12-11 12:22:54 -03 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-12-11 12:32:10 -03 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-12-11 12:32:11 -03 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (952 bytes)
2023-12-11 12:32:12 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingSourcePath]: Including source path: C:\Users\Kobayen\Documents\
2023-12-11 12:32:13 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingSourcePath]: Including source path: C:\Users\Kobayen\Documents\
2023-12-11 12:32:14 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\Users\Kobayen\Documents\(Book Pile)\
2023-12-11 12:32:14 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\Users\Kobayen\Documents\(Book Pile)\

Could you see if you can see sign of a double request? It might be visible at bottom of Source Data tree, Commandline screen in Commandline arguments, or Export as Command-line in its long line.

EDIT:

I wound up attempting to use Linux sort -u to find unique files seen. I’m not certain about my results, but it looked like there were 160373 unique files out of 165418 mentioned in verbose log enumeration. Possibly you can still let File Explorer count the files (but I think you have some excludes), to compare.

Bottom line is it’s tough to tell if it enumerated everything, but it’s easy to see it got far past the last file shown as being examined (and 100% complete), but this could also just be that nothing else needed it.

Since there are appeals here to ‘Duplicati developers’ and I could possibly be suspected to be one, I’d state that this ‘yet another Duplicati stall’ is not ringing any bell at all as far as it concern source scanning. I have never seen this, and I don’t remember a single forum post doing a convincing case for this happening for real. I can’t even imagine how it can happen.

That’s not to say that Duplicati never stalls, I have seen quite a few occurrences myself, but it’s always in the process of uploading data to the backend, usually when a network glitch occurs. Reading back this thread, it’s unclear when stalling occurs. Unfortunately at the moment I have no clear idea on how to make sure that this is indeed - as I suspect - an upload stall problem. To make sure of it, using the ‘profiling’ log tag could be useful, this could imply to add a file log to the backup job instead of default log because using a profiling log level means huge amounts of data.
See this code part for more information on what to look for in the log, first ‘BackupMainOperation’:

and this one for UploadNewFileset (the beginning of Upload if any):

These 2 code words would be the first thing I’d look out for if grepping a profiling log for a stalling job.

Although it means relying on Duplicati (and our understanding of it), the status bar shows no speed, suggesting that uploads have not begun. Confirmation suggestion: lastPgEvent “BackendSpeed”:-1.

Generally one can tell when uploads began but stalled by looking at the counts seen there (later…).
In a normal backup, one can see BackendFileSize grow, and BackendFileProgress try to follow.

Basically one is studying a more complete view of what status bar and job status summarize in GUI.
Job progress says steam_autocloud.vdf was 100% read, but lastPgEvent reveals that was 51 bytes.

The savegames might be tiny files, there might not be later changed files, so it might be somewhere around the backup finish where spillcollector packs up blocks from partially filled temporary files into dblocks ready to upload. The verbose log shows little of this, but the profiling log will get much more, although it doesn’t show the actual file data, but it does show pathnames, so keep private – as usual.

There have been issues and probably posts about stalls near end of backup where “where” is unclear.
Backup job hangs in the middle of the work #3683 has several different lastPgEvent situation reports:

"BackendAction":"Put","BackendPath":"duplicati-b4a9297e64a1d62d63d732e220613f5b3.dblock.zip.aes","BackendFileSize":893,"BackendFileProgress":0,"BackendSpeed":-1

versus

"BackendAction":"List","BackendPath":null,"BackendFileSize":0,"BackendFileProgress":0,"BackendSpeed":-1

which looks more like the case here.

I’m making a case that this is different. I got PM to the Verbose log, and the only BackendEvent in it is:

2023-12-11 12:22:54 -03 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started
2023-12-11 12:32:10 -03 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-12-11 12:32:11 -03 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (952 bytes)

which is probably what lastPgEvent backend data posted above shows as “BackendAction”:“List”

and the last line is:

2023-12-11 12:56:38 -03 - [Verbose-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched: C:\Users\Kobayen\Documents\(Book Pile)\3DMax Literature\index.html

but the verbose log is just not very talkative about the wrap-up of the backup. The profiling log will be…

On older requests, I’d still like to hear about the backup configuration, to explain the doubled scan work, because it seems abnormal. and we might as well try to get things as normal as possible in this backup.

The stall is the code having an issue during the WhenAll code loops. Its extremely difficult to figure out. It therefore keeps running and the application becomes stuck and can’t exit the WhenAll loops because they do not exit unless everything in them exits. Its all tied into the UI so everything gets stuck.

This is why I stopped using Duplicati as my new backup location on my phone will travel outside wifi and trigger this every time if Duplicati is running a backup at that moment. That got way too annoying.

Also seen it on one computer where it did it with the Duplicati db files while another didn’t. Excluding those files fixed it in that situation. As a non code workaround. The I/O issue it throws, the code can’t exit the WhenAll loops as one situation where it gets stuck.

If you want to debug it then all you should need to do is see the error Duplicati is seeing (where the code goes off track right before the stuck situation starts) and then fix the code issue so that it continues or exits properly instead of being stuck. This should be done regardless of if the issue is the computer side or the code side as the application breaks when it doesn’t have to. The problem with that however is that there’s unknown errors that will still exist and trigger it so its not necessarily the way to fix it either.

Its really a nightmare code situation. I haven’t been willing to fix it.

That should help you.

You can also confirm its related to the code running inside WhenAll by the force exit timer as gpatel-fr shows above. You can also confirm by throwing certain errors inside the code running under WhenAll at certain places and it will cause this situation.

I should state that ALL STALLS ANYONE SEES will very likely be this. There shouldn’t be any need to find it again. If you want to try to find every possible code situation where the loops don’t exit or continue properly then that is something else but it should be with or close by the error seen in the logs with or without more verbose logs.

Personally, today I’d probably just throw a static global cancel bool inside all loops to stop any of this forever stuck situation and also get the cancel button actually working and button up the backup with the cancel and exit. But, I still don’t like how the UI thread is tied into the WhenAll use so I have other side issues here.

And of course, I have reservations of the WhenAll use as I haven’t found a working way of force exiting them. Errors unknown so not dealt with should force exit so the UI doesn’t stop working, etc. Backups would be corrupted sure but they are unknown errors not known or dealt with and going to happen anyway. Force exiting the WhenAll would ideally be the fix imo. But, I couldn’t find anything that worked (the timer is not a good way here due to random backup times) and its not meant to be the way WhenAll works. At the same time, throwing cancel bool in every loop, should in theory get the WhenAll’s to exit for instance, and so, hence why I’d likely go that way.

There’s a lot to unpack on the issue. I try to make it short but can’t :slight_smile:

You can do what you do with that info.

@Xavron

When it’s impossible to stop a job with the ‘stop’ feature, it can’t come from the main whenall loop. The cancellation token should handle that. The problem comes from the fact that the upload part uses a different cancellation token, generated in BackendUploader.cs, and this token is never touched by the Stop function. The general idea (my guess !) is that an upload should never be interrupted, the risk of corruption being too high. That’s why I think a stall always comes from an upload problem.

Usually the root cause is that an upload should always cancel itself through a timeout, but by default some backends have an infinite timeout. I have not found a stall that is not fixed by a proper timeout added to backend. The file backend is particularly problematic since there is no real timeout when a disk stalls.

One challenge is that the “You” are different people. If we had reproducible test steps for developers, progress would be easier, but AFAIK we don’t (do you have one?). Lacking that, we try indirect ways.

and I’m showing evidence to the contrary. Maybe if we can see a profiling log, we can get better data. Nothing coming back from this and the other requests will probably mean we have little left to pursue.

That’s only if you think of cancel/stop as that use. It also has to be done with unknown or major code errors or forever loops are going to happen. Forever loops must not happen under any condition.

From what you state I view that as proper condition. But, improper ones will happen. Bugs exist and will continue to :slight_smile:

If OP here is looking into the code then it applies to them as I stated. If they’re not then I gave non code workaround too so that applies to them.

That’s for them to do. They apparently are spending lots of time trying to figure out what to do so I gave the needed info to do it :slight_smile:

I think it’s necessary to have one to be really sure of what you say.

Well, if it can’t be reproduced it can just as well not exist also. But in your case, as you are talking about phone, my guess is that it’s a backend problem too. Connections between phone and computer are very unreliable in my experience. Was it via drive mapping ? Or with something that can be managed with timeout like a http or ftp connection ?

It depends on the code, however I assume (haven’t dug), that it says

[Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started:

before it starts, and it didn’t say that. Do you know the uploading well?
I don’t, but let me look at last dblock (per job → Show log → Remote):

2023-12-18 07:36:47 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "Block" SET "VolumeID" = 3404 WHERE "Hash" = "j0km7Aeakz8YoRPmyEpj1gtok0Jo0O4rVSum2zxX1Lw=" AND "Size" = 102400 AND "VolumeID" = 3405 
2023-12-18 07:36:47 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "Block" SET "VolumeID" = 3404 WHERE "Hash" = "j0km7Aeakz8YoRPmyEpj1gtok0Jo0O4rVSum2zxX1Lw=" AND "Size" = 102400 AND "VolumeID" = 3405  took 0:00:00:00.000
2023-12-18 07:36:49 -05 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b24fdec92d7cd452bb7a16640f8f00456.dblock.zip.aes (49.91 MB)
2023-12-18 07:36:49 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1056, "duplicati-b145f024d5c9d404a8c58f77ec852529d.dblock.zip.aes", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-12-18 07:36:49 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1056, "duplicati-b145f024d5c9d404a8c58f77ec852529d.dblock.zip.aes", "Blocks", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
2023-12-18 07:36:49 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: UPDATE "Block" SET "VolumeID" = 3406 WHERE "Hash" = "T5r/8aMqErCMSuOJkT3mxj4ZsfLz4e9P+ESYhI7Ot6Q=" AND "Size" = 102400 AND "VolumeID" = 3405 
2023-12-18 07:36:49 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: UPDATE "Block" SET "VolumeID" = 3406 WHERE "Hash" = "T5r/8aMqErCMSuOJkT3mxj4ZsfLz4e9P+ESYhI7Ot6Q=" AND "Size" = 102400 AND "VolumeID" = 3405  took 0:00:00:00.000
...
2023-12-18 07:38:26 -05 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b145f024d5c9d404a8c58f77ec852529d.dblock.zip.aes (42.23 MB)
2023-12-18 07:38:26 -05 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1056, "duplicati-i3118f8e94b7e45dfa31d24bc604c69d5.dindex.zip.aes", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
2023-12-18 07:38:26 -05 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (1056, "duplicati-i3118f8e94b7e45dfa31d24bc604c69d5.dindex.zip.aes", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000

The Block table updates are likely SpillCollector moving blocks around while async uploading occurs.

And it looks like even at profiling level, SpillCollector evidence is still indirect. Note that I also use the

  --profile-all-database-queries (Boolean): Activates logging of all database
    queries
    To improve performance of the backups, frequent database queries are not
    logged by default. Enable this option to log all database queries, and
    remember to set either --console-log-level=Profiling or
    --log-file-log-level=Profiling to report the additional log data
    * default value: false

which I think is what shows the Block operations, and thereby makes the large profiling log larger.
I guess I’ll leave that to OP to turn on, or to see if there are enough clues without resorting to that.

I suppose OP could try this, but if the two files that change are excluded, that’s not much of a backup.
Still, it might be useful to exclude the second one that’s processed, to see if last mention is of the first.

Or stick another small file at the end of the source list to see if that just happens to be the last shown, instead of indicating a problem with the last shown file. It did seem to process quite a bit further down.

Basically try some variations on the backup (if destination space allows) to see what works or doesn’t.

EDIT:

For those who prefer to study code, we can start at the output of SpillCollector into BackendRequest:

and then pick up the Input where it’s read in BackendUploader:

and from there, it possibly (not certain) goes through code like

so one awkward test might be to try a backup under dry-run to watch a log at DryRun level for output.
Watching the RemoteOperation table in the database might also work. Basically, is it trying to upload?
Actual upload call for some specific backend looks like it’s a little later than the clue sources of above.

I’m kind of wondering if the last two lines do the log lines and lastPgEvent fields that aren’t happening.

yes, however the code don’t actually logs straight away, it is doing:

m_statwriter.SendEvent(BackendActionType.Put, BackendEventType.Started, item.RemoteFilename, item.Size);

I’d trust it more if profiling was enabled, in this case it would log in BackendManager and it would be actually written in the log before the send is attempted:

using (new Logging.Timer(LOGTAG, string.Format(“RemoteOperation{0}”, item.Operation), string.Format(“RemoteOperation{0}”, item.Operation)))

I added some more notes to my post while you were editing, but I agree that profiling log may help.
I’m still also waiting to hear how source data was configured, as the double-config may be relevant.