Issues to address to get out of beta

No blockers I can think of.

An important release item would be for users to have mono installations of at least version 5 and maybe add a link to the mono-project install instructions.

We should probably also note that the database schema has changed to inform users that a later downgrade may be difficult.

Is this something we can/should check when starting? Or is it unable to start due to framework dependencies?

Yes, that should be prominently listed in the upgrade message.

We need to test if we are able to run that check before the application exits when the framework is old.

Has anyone else tested it in any way? I started testing with 2.0.4.29 yesterday, showing that Duplicati file verify across aftp that failed on 2.0.4.22 worked on 2.0.4.29. Unfortunately uploads are now complaining:

2019-09-18 22:36:52 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-20190919T023652Z.dlist.zip (751 bytes)
2019-09-18 22:36:58 -04 - [Retry-Duplicati.Library.Main.Operation.Backup.BackendUploader-RetryPut]: Operation Put with file duplicati-20190919T023652Z.dlist.zip attempt 1 of 5 failed with message: The file duplicati-20190919T023652Z.dlist.zip was uploaded but the returned size was 752 and it was expected to be 751
Duplicati.Library.Interface.UserInformationException: The file duplicati-20190919T023652Z.dlist.zip was uploaded but the returned size was 752 and it was expected to be 751
   at Duplicati.Library.Backend.AlternativeFTP.AlternativeFtpBackend.<PutAsync>d__37.MoveNext()

which has been there, but whose code got rearranged lately for a different problem that I hadn’t spotted:

poc fix for aftp (from @BlueBlock)

The other large deal I’m seeing is Duplicati is using 65% of a quad-core CPU all by itself while nothing is going on that Process Monitor can see (usually mainly file accesses or network activity), however going through what Process Explorer can see finds .NET Framework (updated 7 days ago) busy with memory.

Symptomatically, the Duplicati web UI would barely respond when I first tried to look at live logs to see the error. There’s also been a Profiling log running, and nothing is going to it, yet something is eating the CPU.

EDIT 1:

Attaching a debugger, first clue is what seems like way too many timeouts. Here’s one tenth of a second:

08:30:06.001 Additional information: The operation has timed out.
08:30:06.014 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.016 Additional information: The operation has timed out.
08:30:06.028 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.030 Additional information: The operation has timed out.
08:30:06.042 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.044 Additional information: The operation has timed out.
08:30:06.056 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.058 Additional information: The operation has timed out.
08:30:06.070 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.072 Additional information: The operation has timed out.
08:30:06.084 Exception thrown: 'System.TimeoutException' in mscorlib.dll
08:30:06.087 Additional information: The operation has timed out.
08:30:06.100 Exception thrown: 'System.TimeoutException' in mscorlib.dll

Anyone good at tracking such things down? My next attempt might be to see if I can get a stack from one.
Problem is that if I’m looking at this I can’t also be looking at what looks like a different aftp translation bug. Attempting to track down the numbers it was giving using tcpdump on server side or DB information from RemoteOperation table list results (for those wondering why DB grows, there’s another big consumer) wasn’t showing me anything but the correct size. Yet somehow Duplicati complains of size being larger…

EDIT 2:

Breaking at System.TimeoutException got the following. Does anybody recognize what this might mean?

|>|mscorlib.dll!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() (IL≈0x000C, Native=0x00007FFE8360C3C0+0x1F)|
|---|---|
| |mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task) (IL≈0x0028, Native=0x00007FFE82C7DEF0+0x3C)|
| |CoCoL.dll!CoCoL.Channel<Duplicati.Library.UsageReporter.ReportItem>.ReadAsync(CoCoL.ITwoPhaseOffer offer) (IL≈0x0589, Native=0x00007FFE51A26850+0xCBF)|
| |mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL≈0x0079, Native=0x00007FFE82C35950+0x163)|
| |mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL=epilog, Native=0x00007FFE82C35930+0x14)|
| |mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() (IL≈0x0024, Native=0x00007FFE82C81250+0x6C)|
| |mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask) (IL≈0x001C, Native=0x00007FFE82C88050+0x62)|
| |mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations() (IL=0x0085, Native=0x00007FFE82C039F0+0x3D7)|
| |mscorlib.dll!System.Threading.Tasks.Task.Finish(bool bUserDelegateExecuted) (IL=epilog, Native=0x00007FFE82C7E840+0x50)|
| |mscorlib.dll!System.Threading.Tasks.Task<System.__Canon>.TrySetException(object exceptionObject) (IL=0x002A, Native=0x00007FFE82C80090+0x71)|
| |mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<Duplicati.Library.UsageReporter.ReportItem>.TrySetException(System.Exception exception) (IL≈0x000E, Native=0x00007FFE82C87BA0+0x1D)|
| |mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL≈0x0079, Native=0x00007FFE82C35950+0x163)|
| |mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL=epilog, Native=0x00007FFE82C35930+0x14)|
| |mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() (IL=epilog, Native=0x00007FFE82C75AD0+0x73)|
| |mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() (IL=0x00A4, Native=0x00007FFE82C74CD0+0x152)|
| |[Native to Managed Transition]|

EDIT 3 try again expanding tabs to spaces. Tabs seem to break the forum formatting even inside the ~~~:

>       mscorlib.dll!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() (IL˜0x000C, Native=0x00007FFE8360C3C0+0x1F)
        mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task) (IL˜0x0028, Native=0x00007FFE82C7DEF0+0x3C)
        Duplicati.Library.UsageReporter.dll!Duplicati.Library.UsageReporter.EventProcessor.<>c__DisplayClass8_0.<Run>b__0(<>f__AnonymousType0`2<CoCoL.IReadChannel<Duplicati.Library.UsageReporter.ReportItem>, CoCoL.IWriteChannel<string>> self) (IL˜0x0288, Native=0x00007FFE51C30570+0x514)
        mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL˜0x0079, Native=0x00007FFE82C35950+0x163)
        mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL=epilog, Native=0x00007FFE82C35930+0x14)
        mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() (IL˜0x0024, Native=0x00007FFE82C81250+0x6C)
        mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask) (IL˜0x001C, Native=0x00007FFE82C88050+0x62)
        mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations() (IL=0x0085, Native=0x00007FFE82C039F0+0x3D7)
        mscorlib.dll!System.Threading.Tasks.Task.Finish(bool bUserDelegateExecuted) (IL=epilog, Native=0x00007FFE82C7E840+0x50)
        mscorlib.dll!System.Threading.Tasks.Task<System.__Canon>.TrySetException(object exceptionObject) (IL=0x002A, Native=0x00007FFE82C80090+0x71)
        mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<Duplicati.Library.UsageReporter.ReportItem>.SetException(System.Exception exception) (IL˜0x0029, Native=0x00007FFE82C80660+0x48)
        CoCoL.dll!CoCoL.Channel<Duplicati.Library.UsageReporter.ReportItem>.ReadAsync(CoCoL.ITwoPhaseOffer offer) (IL˜0x059D, Native=0x00007FFE51A26850+0xDB6)
        mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL˜0x0079, Native=0x00007FFE82C35950+0x163)
        mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL=epilog, Native=0x00007FFE82C35930+0x14)
        mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() (IL˜0x0024, Native=0x00007FFE82C81250+0x6C)
        mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask) (IL˜0x001C, Native=0x00007FFE82C88050+0x62)
        mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations() (IL=0x0085, Native=0x00007FFE82C039F0+0x3D7)
        mscorlib.dll!System.Threading.Tasks.Task.Finish(bool bUserDelegateExecuted) (IL=epilog, Native=0x00007FFE82C7E840+0x50)
        mscorlib.dll!System.Threading.Tasks.Task<System.__Canon>.TrySetException(object exceptionObject) (IL=0x002A, Native=0x00007FFE82C80090+0x71)
        mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<Duplicati.Library.UsageReporter.ReportItem>.TrySetException(System.Exception exception) (IL˜0x000E, Native=0x00007FFE82C87BA0+0x1D)
        mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL˜0x0079, Native=0x00007FFE82C35950+0x163)
        mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) (IL=epilog, Native=0x00007FFE82C35930+0x14)
        mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() (IL=epilog, Native=0x00007FFE82C75AD0+0x73)
        mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() (IL=0x00A4, Native=0x00007FFE82C74CD0+0x152)
        [Native to Managed Transition]

EDIT 4

Ignoring the CPU loop for now (partly because I’m not sure how to go about debugging), I tested 2.0.4.26.
aftp problem appears to be the assumption of the poc fix that polls file size, which underneath gets SIZE:

https://tools.ietf.org/html/rfc3659#page-11

File SIZE

The FTP command, SIZE OF FILE (SIZE), is used to obtain the transfer size of a file from the server-FTP process. This is the exact number of octets (8 bit bytes) that would be transmitted over the data connection should that file be transmitted. This value will change depending on the current STRUcture, MODE, and TYPE of the data connection or of a data connection that would be created were one created now. Thus, the result of the SIZE command is dependent on the currently established STRU, MODE, and TYPE parameters.

12:02:09.977814 IP 192.168.0.108.52133 > 192.168.0.113.ftp: Flags [P.], seq 235:278, ack 571, win 2050, length 43: FTP: SIZE duplicati-20190919T160203Z.dlist.zip
        0x0000:  4500 0053 8ca7 4000 8006 ebcf c0a8 006c  E..S..@........l
        0x0010:  c0a8 0071 cba5 0015 27de e367 d117 783f  ...q....'..g..x?
        0x0020:  5018 0802 521a 0000 5349 5a45 2064 7570  P...R...SIZE.dup
        0x0030:  6c69 6361 7469 2d32 3031 3930 3931 3954  licati-20190919T
        0x0040:  3136 3032 3033 5a2e 646c 6973 742e 7a69  160203Z.dlist.zi
        0x0050:  700d 0a                                  p..
12:02:09.978115 IP 192.168.0.113.ftp > 192.168.0.108.52133: Flags [P.], seq 571:580, ack 278, win 229, length 9: FTP: 213 527
        0x0000:  4500 0031 16cb 4000 4006 a1ce c0a8 0071  E..1..@.@......q
        0x0010:  c0a8 006c 0015 cba5 d117 783f 27de e392  ...l......x?'...
        0x0020:  5018 00e5 8251 0000 3231 3320 3532 370d  P....Q..213.527.
        0x0030:  0a                                       .

End says transfer size is 527. Actual size at FTP remote is 525. Linux-to-Windows would convert Linux end-of-line newline characters (two present) to CR NL (a.k.a. CR LF) before sending due to ASCII mode.

Poll loop gets upset because it never sees 525, decides there’s a length problem, throws the exception.

Quite possibly the FTP server has to read the entire file through to figure out transfer size after translation. Alternatives might include trying to poll size by directory listing (which returns 525), or setting binary mode.
RemoteOperation table view of the file:

[
{"Name":"duplicati-20190919T155719Z.dlist.zip","LastAccess":"0001-01-01T00:00:00","LastModification":"2019-09-19T11:57:21-04:00","Size":525,"IsFolder":false},
{"Name":"duplicati-20190919T155720Z.dlist.zip","LastAccess":"0001-01-01T00:00:00","LastModification":"2019-09-19T11:57:36-04:00","Size":525,"IsFolder":false},
{"Name":"duplicati-b78ecce6aadd84e86874ff7c691911c65.dblock.zip","LastAccess":"0001-01-01T00:00:00","LastModification":"2019-09-19T11:57:36-04:00","Size":679,"IsFolder":false},
{"Name":"duplicati-bc09e57d2b9004fb18e157a4ba777f360.dblock.zip","LastAccess":"0001-01-01T00:00:00","LastModification":"2019-09-19T11:57:21-04:00","Size":679,"IsFolder":false}
]

https://github.com/robinrodricks/FluentFTP#api shows some of the things to pick from, if poc fix is actually needed. Was it needed under System.Net.FtpClient, I wonder, or did it become needed due to FluentFTP?

Hmm, I just noticed it using ~30% with no jobs in progress. (quad core + hyperthreading)
I quit the TrayIcon and restarted it. Back at 0%. Something is definitely amiss… (this is with 2.0.4.29)

Checked on of my Linux systems, Duplicati has no jobs in progress but it’s consuming 40% CPU time:

And the docker container on my NAS, same thing… no running jobs right now:

Capture

So this issue is affecting multiple platforms.

EDIT: 5-10 minutes after restarting Duplicati, CPU usage goes up again.

Not sure why this is happening again, but it smells like the usage reporter going bonkers. It could be related to the updated CoCoL library, which may have tinkered just a bit with how the timeout exceptions are thrown so they are no longer correctly detected.

That sounds like the usage reporter. It waits a little after starting before running the report submission code.

Going to test your theory by setting USAGEREPORTER_Duplicati_LEVEL=none in docker container environment. Will report back in about 10 minutes. :slight_smile:

Looking good…

Capture

I have fixed the problem in CoCoL. It was a case where infinite timeout (aka wait forever) was treated as immediate timeout (aka do not wait at all).

I will update the references in Duplicati when the build is in Nuget and send out a new canary.

1 Like

@ts678 can you provide me with instructions for testing this?
It seems that it is at least server dependant, but maybe also source+destination OS dependent.

What OS do I need to run Duplicati on, and what FTP server and OS do I need to test against to replicate your findings?

I this is the last blocker for a new beta release, I will try to dedicate a day during next week to fix it.

The upcoming beta milestone also lists the pending Google Drive access restriction as a possible blocker. Did we get any more clarity on that, or are the details of the limitations still somewhat unclear?

Purge operations result in broken dlist files #3924 is IMO a show-stopper/backup-breaker and repair can also cause it. Fortunately, there is some progress on it, so possibly this issue will leave the blockers list…

Regarding the new mono requirement of v5, I hope that at least the release notice publicizes that. Canary test has seen about 4 incidents of older mono, with 2 having Duplicati crash, and 2 needing a newer TLS.

Although I think package maintainers should update dependencies, and ideally autoupdater would test too, maybe we can survive without that. I don’t like it, but decided above to yield and find out how Canary fared.

“FTP (Alternative)” breakage is a regression I have a hard time quantifying, but aftp is somewhat rare and “FTP” can often substitute for it. In addition I think the SIZE command getting in trouble requires the server to be non-compliant with the File Transfer Protocol standard in RFC 959 transfer parameters requirement:

4.1.2.  TRANSFER PARAMETER COMMANDS

         All data transfer parameters have default values, and the
         commands specifying data transfer parameters are required only
         if the default parameter values are to be changed.  The default
         value is the last specified value, or if no value has been
         specified, the standard default value is as stated here.  This
         implies that the server must "remember" the applicable default
         values.

Specifically Windows 10 Duplicati aftp talking to inetutils-ftpd on Linux was given an ASCII size even though the transfer parameters had previously been set to image. I tested vsftpd and it gave a binary size, which made me hope that most modern FTP servers would do OK. But there are lots, I tested no further, and there’s no telling what sort of old crufty FTP might be in a NAS that’s still also using old crufty SMBv1. Needing a size-check workaround at all bothers me, e.g. why is it needed and does transfer time matter?

I can try to do better on instructions, but it’s easy. I think I backed up a 1 byte and 4 byte file, no encryption.

On release process matters, there’s also a Duplicati version numbering proposal with some good ideas, extending my Release: 2.0.4.23 (beta) 2019-07-14 idea of leaving some number space for beta patching.

I discovered the following issue yesterday:

Basically, there are times when a purge command doesn’t remove the specified file. Instead, it removes all other files. I’m surprised this hasn’t come up before. Are my expectations simply incorrect? If not, then I think this is definitely a blocker for the next beta.

The which-files-to-purge issue seems to be uncovering old problems. It’s odd they hadn’t gotten noticed.

I just looked at a problem which unfortunately doesn’t seem like a 2.0.4.5 problem. Anyone run with logs? What seems to have started happening some time post-2.0.4.5 is a backend put retry of a dblock keeps original dblock name in the dindex vol, which I think means broken backup for Recreate or direct restore.

is probably what was supposed to happen, but at some point it stopped happening. Because retries are regular network events, I hate to think they could produce this sort of hidden damage in the backup files.

2019-09-21 13:03:55 -04 - [Retry-Duplicati.Library.Main.Operation.Backup.BackendUploader-RetryPut]: Operation Put with file duplicati-b6ea0f72338224eaf85cd78e14130567b.dblock.zip.aes attempt 1 of 5 failed with message: 503 - service_unavailable: c001_v0001101_t0008 is too busy
System.Exception: 503 - service_unavailable: c001_v0001101_t0008 is too busy

2019-09-21 13:04:06 -04 - [Information-Duplicati.Library.Main.Operation.Backup.BackendUploader-RenameRemoteTargetFile]: Renaming “duplicati-b6ea0f72338224eaf85cd78e14130567b.dblock.zip.aes” to “duplicati-bf56b73cbbfde4001ab923d7a1b94aa4a.dblock.zip.aes”

If anyone kept logs, using DB RemoteVolume and IndexBlockLinks tables can help study and file-pairing.

I don’t know how often people use the purge command. I only just started experimenting with it a couple weeks ago.

Nice work troubleshooting, looks like you may be onto something…

Issue on the dindex file corruption is now filed. Seems odd that this has been broken-but-hidden since v2.0.4.16-2.0.4.16_canary_2019-03-28 however if this is as bad as it seems, I think it’s a beta blocker.

Does no other developer or person seeing this keep logs? You might find this issue in your backup too however you might need to work a little harder to decrypt your dindex zip file with AES Crypt to study it.

1 Like

I had a look at this. Unlike the other aftp issue, the SIZE issue arises from code in Duplicati.
Given that this is a non-compliant behavior, I think we can require the user to deal with it.

My idea is to add an option like --aftp-upload-delay=0s. This option is disabled (set to 0s) by default, giving the current behavior. If this option is set to a positive timespan, it will simply wait that amount of time instead of checking the remote size. This allows users to disable the failing SIZE check, and also get the upload delay if required, and adjustable to their network speed.

EDIT: Turns out there was already a --disable-upload-verify option meant to disable the SIZE check, but it was not doing anything. This means that the --aftp-upload-delay option only adds a wait delay, and users on non-compliant FTP servers need to set:

--disable-upload-verify=true
--aftp-upload-delay=5s

Any objections to that approach? @ts678 @BlueBlock

EDIT2: A proposed PR is here: Feature/aftp update by kenkendk · Pull Request #3956 · duplicati/duplicati · GitHub

Alternative FTP failure? seems like a good place to discuss because I think we might be talking about a retry on the PR that tried to workaround the issue, and the “get out (of|a) beta” topic here is getting long.

Summary of my current thoughts on proposed PR retry is it’s an improvement, but if the issue is actually premature finish from FluentFTP, and not some bug in Duplicati’s async code – and I’m NOT an expert, then I’d prefer trying to get to the bottom of it rather than, for example, have to tune the upload delay for dblock even if a much smaller file is uploaded. Note that use of UploadAsync was added for concurrent uploads, but exposed the FluentFTP bug that did get fixed, and might have added premature finish bug. There’s also the question of whether a poll workaround should be SIZE based, or LIST/MLST as before.