My main Linux desktop rig has 96 gb of ram. The size of my backup is roughly 200 gb in size.
Unless I set asynchronous-concurrent-upload-limit and asynchronous-upload-limit both to 2, I cannot complete a backup to Google Drive. If I try the default values or rise any of them above 2, Duplicati aborts the backup claiming there is no space left on the /tmp/ drive.
df reports that I’m using around 10 gb of space on /tmp. free reports that I’m using 22 gb of ram altogether and have 74 gb of ram free. I’ve monitored the amount of free space whilst the backup is running and it never exceeds these values. Neither does the ram usage of mono/duplicati ever exceed anything but a few mere gb.
I’ve tried manually altering the size of tmps in fstab to 64 gb. Which doesn’t help, it still claims there is no space left on /tmp/ even though there is plenty of space left.
I remember dimly (I have no personal interest in having a separate /tmp) seeing a poster having exactly this problem a few weeks (or months ? time fly…) ago. IIRC the key was that when one deletes an open file on Linux, it can be kept open for some time, and while its size is added to the ‘free’ space, in fact it is not yet available leading to perplexing results.
Hmm, I did not find it (did not search very hard) but there is a reference a long time ago already:
Those seem reasonable. For 200 GB backup, default 100 KB blocksize is a bit small. Overhead builds.
100 MB for dblock-size is just a slight multiple from 50 MB default. What is Remote volume size set to? Typically that’s the one that gets set huge by accident, so huge files try to go through /tmp and don’t fit.
You could look in your destination to see if you have any files bigger than you expect from that 100 MB.
Is this by any chance NUMA, probably implying multiple CPU sockets? tmpfs limitations there may vary, meaning the free amount of RAM might not all be available, apparently depending on memory zone use, although this sounds pretty obscure.
I actually have NUMA turned by a kernel parameter, so nope can’t be it.
I got that one under control too.
For what it’s worth, my issue is gone now. I don’t know why or what I did that could possibly change this, other than updating to kernel 6.1.x. But I’m pretty sure that it’s not related to any kernel changes. I just hope the error won’t come back again.
The error is only gone as long as the backup doesn’t grow in size beyond some unknown limitation being set. I suspect it’s mono that has some limit set up somewhere, but I know just about nothing about mono or .net so I don’t know where to start.
I’ve checked and made sure it’s not a limitation of inodes, open files and stuff lke that. And yep, I still have over 80 gb of free ram and over 40 gb free on /tmp/.
I suppose you could try filling /tmp with a big dd from /dev/zero to see if it can hold what it claims. Tmpfs (kernel.org) bad config may deadlock, if you use it – you originally spoke about tmps in fstab.
Do you have enough real drive space to see if moving tempdir away from /tmp solves space error?
This is especially important if /tmp is tmpfs now, although I can’t find any df oddity documented…
I did test it with a deleted-but-still-open file, and it wasn’t fooled by that. Free space remained down.
dd if=/dev/zero bs=1024 count=1000 of=fill; tail -f fill in one window df -k . and rm fill in another. Control-C the tail to see the free space increase
I did this on my /tmp which seems to be on the regular VM drive space, same as / is.
Was there an actual error message or stack trace (even better) posted here to see how it got to that?
Possibly you’ll have to look in About → Show log → Stored (and click line) to get needed information.
How are you sure that it’s size? Other things run only occasionally, such as the compact (as needed).
This is in the regular job log, but log-file=<path> and log-file-log-level=information gives a better view.
That would also make it easier to see when in the process it fails. Do you have any descriptions now?
It’s also odd that you use so much /tmp space. Do you see more 100 MB files in there than expected?
This ignores any invisible files, but if you want you could probably see those with lsof per other topic.
Hmm, the original poster wrote it didn’t help moving the temp directory to a place different than /tmp/. However, I’ll try that out if (when?) this error occurs again. I wrote in the bottom of this post what I did to hopefully work around this error.
Yes, I tried that and also copied a few large files to /tmp/ almost filling it up. Everything looks to be correct even if I use up 60 out of 48 gb of space (I got a 32 gb swapfile setup), and I can keep using the computer as well with no issues nor crashes.
This isn’t a deadlock issue. I had a faulty gfx card years ago which caused a deadlock under heavy load. A deadlock means the computer completely locks up, and you cannot shutdown nor reboot it by any ways other than using the physical reset button or powercycling. It’s one degree worse than a kernel panic, when you can actually do a reboot and get an error displayed.
Thank you for pointing me to this, it was a wee bit more comprehensive:
System.IO.IOException: Disk full. Path /tmp/dup-406eb4a4-3dcf-488e-aec5-cf0729d2d345
at System.IO.FileStream.FlushBuffer () [0x00081] in <282c4228012f4f3d96bdf0f2b2dea837>:0
at System.IO.FileStream.Flush () [0x00018] in <282c4228012f4f3d96bdf0f2b2dea837>:0
at System.IO.StreamWriter.Flush (System.Boolean flushStream, System.Boolean flushEncoder) [0x00096] in <282c4228012f4f3d96bdf0f2b2dea837>:0
at System.IO.StreamWriter.Flush () [0x00006] in <282c4228012f4f3d96bdf0f2b2dea837>:0
at Newtonsoft.Json.JsonTextWriter.Flush () [0x00000] in :0
at Duplicati.Library.Main.Volumes.FilesetVolumeWriter.AddFilelistFile () [0x0000b] in :0
at Duplicati.Library.Main.Volumes.FilesetVolumeWriter.Close () [0x00008] in :0
at Duplicati.Library.Main.Volumes.FilesetVolumeWriter.Dispose () [0x00000] in :0
at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x01048] in :0
at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0
at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in :0
at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in :0
at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0026f] in :0
at Duplicati.Library.Main.Controller.Backup (System.String inputsources, Duplicati.Library.Utility.IFilter filter) [0x00074] in :0
at Duplicati.Server.Runner.Run (Duplicati.Server.Runner+IRunnerData data, System.Boolean fromQueue) [0x00349] in <156011ea63b34859b4073abdbf0b1573>:0
Though I’ve no clue what any of this means other than the top line “Disk full” message
That’s the thing, I’m not using so much /tmp space. I’ve carefully monitored memory and /tmp/ usage right up to the moment the error appears and it’s never even close of using up all the space available.
I restarted the backup, it’s been running all night and so far - no errors. What I changed is:
asynchronous-concurrent-upload-limit, from 4 to 2
asynchronous-upload-limit, from 4 to 2
concurrency-compressors, from 6 to 2
The downside is that my upload speed is now half of what it was, to around 25 MB/s.
I can understand how these options force duplicati to use less /tmp/ storage, but why am I somehow prevented from fully using all my available /tmp/s space and RAM? If 2 concurrent threads use 8 gb of storage space in /tmp/, I should be able to use 10 times as many if I want to since i got 96 gb of RAM. And yet something prevents from using more than 2 threads, hence my suspicion that it’s a configurable limitation in mono.
The line you quoted was not for the original issue but for the space test stress of the previous quote, because it might have stopped your system (assuming deadlock only is a risk under memory stress).
It seemed like it would be worth warning about the risk of the suggestion.
Does that file still exist? If Duplicati didn’t clean it up, how big is it and what’s inside, e.g. using less?
How the backup process works refers to a filelist.json which eventually goes into a dlist file to describe source files from some backup version. The Fileset term also means a set of source files.
is (I think – any C# developers care to look?) trying to copy a /tmp/dup-* file into the fileset.json that’s inside a .zip file when the mono library gets told (presumably by Linux) that the disk became full.
Below that, one could strace to look for the system call that reported the fill, but that’s deep looking…
Because a copy is happening, there might be two copies there at once, which adds to the space stress. How large is a typical dlist file in your destination? If small, having several might not add much stress.
I’m still interested in /tmp/dup-* file behavior and leftovers. Maybe even poll ls -lhrt /tmp/dup*.
Poll speed for this and df depends on how fast you think a file copy could maybe temporarily fill /tmp.
Maybe I’m not doing the math correctly? How do you calculate the theoretical memory usage when the remote volume size is set to 4 gb with the number of threads set to 3, 3 and 3?
What’s new compared to yesterday is that the backup still continues despite the disk full error. Now it’s using 42 gb out of 48 in /tmp/, perhaps I managed to time it perfectly this time to watch it crash completely within a few minutes? I think you were right in that it only fills up for mere fractions of a second and then deletes all the dup-files making space available again.
Thank you @ts678 for all your help and support, it’s VERY appreciated and I realize my issue is far more complicated than what most others would indulge themselves in trying to solve.
Duplicati is so far removed from that that it’s probably not possible. It’s all automatic as mono wishes.
Since tmpfs lives completely in the page cache and on swap
according to the kernel doc, there’s a very fuzzy line between memory and drive that I hope doesn’t worsen the puzzle here, but if you mean /tmp usage, I think it’s supposed to be as in Duplicati docs.
Earlier on, math was simpler because there was not a concurrent uploader, and just one queue per:
--asynchronous-upload-limit (Integer): The number of volumes to create ahead
When performing asynchronous uploads, Duplicati will create volumes that
can be uploaded. To prevent Duplicati from generating too many volumes,
this option limits the number of pending uploads. Set to zero to disable
* default value: 4
The initial block collection is an unencrypted .zip file, and encryption is done as a separate step.
I’m not sure if the queue is encrypted or not, but you can run file on any dup-* for a guess by it.
Later came concurrent uploads, adding upload speed, and muddying queues question somewhat:
--asynchronous-concurrent-upload-limit (Integer): The number of concurrent
When performing asynchronous uploads, the maximum number of concurrent
uploads allowed. Set to zero to disable the limit.
* default value: 4
I certainly hope each of the uploaders doesn’t have its own queue of default size 4. You can test it.
This is why I had suggested ls -lhrt /tmp/dup*. You can watch future dblocks flowing through.
--concurrency-compressors (Integer): Specify the number of concurrent
Use this option to set the number of processes that perform compression of
* default value: 2
It “looks” like AsynchronousUploadLimit is handled by the backend code, which is after the above.
The pre-generated volumes will be placed into the temporary folder by default, this option can set a different folder for placing the temporary volumes, despite the name, this also works for synchronous runs.
which is another control beyond tempdir to either help in your analysis, or maybe help space usage.
I’m pretty sure the upload is done from the encrypted version of the dblock file. I’m not sure when the unencrypted version is deleted though, but you might have both at once (encrypted a little bit larger). There’s won’t be any tmp file with a dblock name. That name is set by giving the name to the remote.
file can probably identify .zip files. Programs like less can get clues too, usually near the file start.
If you really want to count and trace files, have at it, and please tell use what files are flowing where…
Alternatively, using smaller remote volumes would probably go a long way towards solving space lack.