No, I don’t use containers, just a KVM virtual machine.
Initially I checked, but don’t have a .gdbinit file. I’ll create one and add the settings this evening. I need to get some work done now!
No, I don’t use containers, just a KVM virtual machine.
Initially I checked, but don’t have a .gdbinit file. I’ll create one and add the settings this evening. I need to get some work done now!
Sadly, I don’t know what I’m doing here. I don’t know gdb at all.
“/root/.config/gdb” doesn’t exist, so I added “set auto-load safe-path /” to a brand new gdb folder and file “/root/.config/gdb/gdbinit”. I also hard linked that to “/root/.gdbinit”, but it still gives the same error message.
A strange behaviour is as follows:
Are my two large jobs failing because of too many files?
sorry I had done a copypasta of your error message and reading it again I don’t understand why there is a #011 at the beginning of the instruction. Should not be IMO. So your /root/.gdbinit should be something like
add-auto-load-safe-path /usr/bin/mono-sgen-gdb.py.
However I think that to have this file you should install mono-devel. Wait, on my test install I have installed mono-devel and I don’t have it. Ah yes, should install mono-dbg.
About your backup, do you use LVM ? IIRC it’s not used with Fedora but I may be wrong.
Yes, I do use LVM for my OS files, and /root. My /home folder is on another disk, and is not LVM.
There isn’t a mono-dbg package. Here is what I have:
# dnf list mono\*
Last metadata expiration check: 2:16:03 ago on Mon 15 Aug 2022 03:45:00 PM SAST.
Installed Packages
mono-addins.x86_64 1.3.3-1.fc36 @fedora
mono-complete.x86_64 6.12.0-6.fc36 @fedora
mono-core.x86_64 6.12.0-6.fc36 @fedora
mono-data.x86_64 6.12.0-6.fc36 @fedora
mono-data-oracle.x86_64 6.12.0-6.fc36 @fedora
mono-data-sqlite.x86_64 6.12.0-6.fc36 @fedora
mono-devel.x86_64 6.12.0-6.fc36 @fedora
mono-extras.x86_64 6.12.0-6.fc36 @fedora
mono-locale-extras.x86_64 6.12.0-6.fc36 @fedora
mono-mvc.x86_64 6.12.0-6.fc36 @fedora
mono-reactive.x86_64 6.12.0-6.fc36 @fedora
mono-wcf.x86_64 6.12.0-6.fc36 @fedora
mono-web.x86_64 6.12.0-6.fc36 @fedora
mono-winforms.x86_64 6.12.0-6.fc36 @fedora
mono-winfx.x86_64 6.12.0-6.fc36 @fedora
monodoc.x86_64 6.12.0-6.fc36 @fedora
Available Packages
mono-addins-devel.i686 1.3.3-1.fc36 fedora
mono-addins-devel.x86_64 1.3.3-1.fc36 fedora
mono-basic.x86_64 4.7-9.fc36 fedora
mono-basic-devel.i686 4.7-9.fc36 fedora
mono-basic-devel.x86_64 4.7-9.fc36 fedora
mono-bouncycastle.x86_64 1.8.10-2.fc36 fedora
mono-cecil.x86_64 0.10.4-6.fc36 fedora
mono-cecil-devel.i686 0.10.4-6.fc36 fedora
mono-cecil-devel.x86_64 0.10.4-6.fc36 fedora
mono-cecil-flowanalysis.x86_64 0.1-0.40.20110512svn100264.fc36 fedora
mono-cecil-flowanalysis-devel.i686 0.1-0.40.20110512svn100264.fc36 fedora
mono-cecil-flowanalysis-devel.x86_64 0.1-0.40.20110512svn100264.fc36 fedora
mono-core.i686 6.12.0-6.fc36 fedora
mono-devel.i686 6.12.0-6.fc36 fedora
mono-mvc-devel.i686 6.12.0-6.fc36 fedora
mono-mvc-devel.x86_64 6.12.0-6.fc36 fedora
mono-reactive-devel.i686 6.12.0-6.fc36 fedora
mono-reactive-devel.x86_64 6.12.0-6.fc36 fedora
mono-reactive-winforms.x86_64 6.12.0-6.fc36 fedora
mono-reflection.i686 0.1-0.25.20110613git304d1d.fc36 fedora
mono-reflection.x86_64 0.1-0.25.20110613git304d1d.fc36 fedora
mono-reflection-devel.i686 0.1-0.25.20110613git304d1d.fc36 fedora
mono-reflection-devel.x86_64 0.1-0.25.20110613git304d1d.fc36 fedora
mono-tools.x86_64 4.2-25.fc36 fedora
mono-tools-devel.i686 4.2-25.fc36 fedora
mono-tools-devel.x86_64 4.2-25.fc36 fedora
mono-tools-gendarme.x86_64 4.2-25.fc36 fedora
mono-tools-monodoc.x86_64 4.2-25.fc36 fedora
mono-web-devel.i686 6.12.0-6.fc36 fedora
mono-web-devel.x86_64 6.12.0-6.fc36 fedora
mono-zeroconf.x86_64 0.9.0-33.fc36 fedora
mono-zeroconf-devel.i686 0.9.0-33.fc36 fedora
mono-zeroconf-devel.x86_64 0.9.0-33.fc36 fedora
monobristol.x86_64 0.60.3.1-21.fc36 fedora
monochrome-icon-theme.noarch 16.10-13.20180421bzr625.fc36 fedora
monocypher.i686 3.1.2-3.fc36 fedora
monocypher.x86_64 3.1.2-3.fc36 fedora
monocypher-devel.i686 3.1.2-3.fc36 fedora
monocypher-devel.x86_64 3.1.2-3.fc36 fedora
monodevelop.x86_64 5.10.0-22.fc36 fedora
monodevelop-debugger-gdb.x86_64 5.0.1-11.fc36 fedora
monodevelop-devel.i686 5.10.0-22.fc36 fedora
monodevelop-devel.x86_64 5.10.0-22.fc36 fedora
monodoc-devel.i686 6.12.0-6.fc36 fedora
monodoc-devel.x86_64 6.12.0-6.fc36 fedora
monosim.x86_64 1.5.2-25.fc36 fedora
monotone.x86_64 1.1-42.fc36 fedora
monotone-server.x86_64 1.1-42.fc36 fedora
[root@fedora ~]#
Mono 6.12.0 Release Notes says that’s a November 2020 version. Maybe try a newer download?
Although I couldn’t find a relevant mono project issue, one never knows. Technically it died in libc.
getpwuid_r might be trying to fetch names (reverse-lookup from number) for the owner and group.
Looking at a sample block of metadata from a Linux dblock file, its format does store such names.
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Mono.Unix.Native.Syscall:sys_getpwuid_r <0x000a8>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Mono.Unix.Native.Syscall:getpwuid_r <0x00077>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Mono.Unix.UnixUserInfo:.ctor <0x00087>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Mono.Unix.UnixFileSystemInfo:get_OwnerUser <0x0006b>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at FileInfo:.ctor <0x000b3>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at UnixSupport.File:GetUserGroupAndPermissions <0x00057>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Duplicati.Library.Common.IO.SystemIOLinux:GetMetadata <0x001eb>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Duplicati.Library.Snapshots.NoSnapshotLinux:GetMetadata <0x0005a>
Aug 9 20:59:31 fedora duplicati-server[259594]: #011 at Duplicati.Library.Main.Operation.Backup.MetadataGenerator:GenerateMetadata <0x00121>
I don’t know why a lookup would fail though. Perhaps ltrace could confirm the call where it aborts.
That’s too bad, otherwise I’d wonder if it found a file that killed it, and ask for narrowing area down.
There’s still a chance it’s file-specific. You could run a verbose log-file to see how random death is.
Channel Pipeline tries to explain internals. Source code has GenerateMetadata uses in these files
https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/Main/Operation/Backup/FilePreFilterProcess.cs
https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/Main/Operation/Backup/MetadataPreProcess.cs
So there are some debug ideas to try to find why mono
is dying. Of course it’s not supposed to die…
Aug 9 20:59:33 fedora systemd[1]: duplicati.service: Main process exited, code=exited, status=134/n/a
From Google search, 134 may be what mono
does when it dies this way. That’s not a Duplicati code.
I think I am having a closely related problem. My counting files never gets past a couple of thousand, then it hangs with connecting to server error. It aborts, and recycles the process in an endless loop.
Reading your thread prompted me to check journalctl. I found this line at the beginning of each loop iteration:
mono-sgen: pthread_mutex_lock
I also just came across this in the log, not sure if it is meaningful or not:
abrt-server[21904]: Blacklisted package ‘mono-core’
Seems packages don’t have the same name as for Ubuntu
Do you have a /usr/bin/mono-sgen-gdb.py file or not ?
re: LVM I was thinking that snapshot could cause problem, but it’s logical that your backup is running in non snapshot mode since your backup set include a non LVM device.
Do you use LDAP ? I don’t quite see how an access to the passwd file could cause random problems un less some weird security stuff such as SeLinux could be involved (in your place I’d try to check sestatus anyway), but if there is a network access involved, it could be a reason.
About the problem appearing after an update, it could be that the update forced a reboot, and this reboot triggered some previous change to be effectively applied. Wild ideas I know but trying to debug an intermittent problem remotely can be challenging. It’s challenging enough when dealing with it directly.
The randomness of it all makes me think it’s read errors on your source. If nothing else actually changed, that’s my bet.
Can you try to backup/copy you’re important data using another method, if you can, chance are your source is fine and there is a problem with Duplicati, if not then you know where you’re problem lies.
So I’ve tried a few things:
@ts678 suggested uprading mono, which I did, but that made no difference. I then ran the backup logging to a file with verbose as level. I ran the job twice, and the first time it got further than the second time. It just stops. No error recorded on the log itself - only the crash in /var/log/messages. The 134 code seems to show only when running as a service, not when running from CLI, so I think it’s a systemd exit code.
@gpatel-fr I do have /usr/bin/mono-sgen-gdb.py - what do I do with that? I don’t use LDAP. I have a standalone Linux workstation that I have a lot of data on. I run Duplicati on it, and backup to an old PC with Fedora 36 installed and a 12 TB drive - I use SFTP/SSH as connection method.
To try to eliminate some options, I did the following test: I used the Fedora ISO to create a new KVM VM on Fedora 36. I didn’t patch it. I installed Duplicati. I mounted the three folders I back up with SSHFS onto the VM, so they appear as part of the local file system. I then backed that up to the same destination server successfully, four times. So I believe the source to 100%. My latest experiment is I renamed the job’s database files, and renamed the folder on the destination. So now I am effectively running the job as if it were a first time backup. It’s going to take some time, then I will report back. If it succeeds it means my Duplicati backup (destination) files somehow got corrupted.
Half an hour ago, the process crashed. It had been running for 5 hours. Again, this is a different file to before.
What I did notice in LOOKING at /var/log/messages instead of just grepping it, is that it actually wrote a systemd-coredump for the mono-gen process. I have an 18 MB file in /var/lib/systemd/coredump. Would that be any use to anyone in the thread?
(It in fact wrote a number of them over the last two weeks.)
Alternatively, can someone guide me in what to do to analyze the problem? I don’t mind doing the grunt work, I just don’t know what to do.
My problems started after I did a dnf upgrade. It updated a few dozen packages that I have installed.
By any chance did your problems also start after upgrading? If so, maybe we can compare lists of what packages were affected.
Thanks
as I understand it, when the application crashes it would allow to use gdb but I’d admit that if it does not start automatically I’m not sure how to force it :-/
about the crash dump, you are supposed to open them with gdb, so it’s actually useful to know it. Basically you learn the executable name by using ‘file core’, and then ‘gdb executable_name core’. After that you are in gdb.
Edit: thinking more about it, using gdb would be productive only for a specific class of users - mostly developers. Outside of that, it would not get more info that a crash report such as the one you showed in your first post. Only thing I would have liked a complete trace. Why ? because it could possibly have exposed a badly managed access problem between threads. As some threads are not shown, it’s not possible to say. If you have still the complete trace, you could take a look if the getpwuid appears in 2 threads. In this case it could show a problem in Duplicati - or Mono. Not very likely but who knows.
List was posted earlier.
Not much in common. Can anybody comment if these could be the culprit? I would doubt the rpm items.
Upgrade koji-1.29.1-1.fc35.noarch @updates Upgrade koji-1.29.1-3.fc35.noarch @updates
Upgrade python3-koji-1.29.1-1.fc35.noarch @updates Upgrade python3-koji-1.29.1-3.fc35.noarch @updates
Upgrade python3-rpm-4.17.1-2.fc35.x86_64 @updates Upgrade python3-rpm-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-build-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-build-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-build-libs-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-build-libs-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-libs-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-libs-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-plugin-selinux-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-plugin-selinux-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-plugin-systemd-inhibit-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-plugin-systemd-inhibit-4.17.1-3.fc35.x86_64 @updates
Upgrade rpm-sign-libs-4.17.1-2.fc35.x86_64 @updates Upgrade rpm-sign-libs-4.17.1-3.fc35.x86_64 @updates
all these packages are related to rpm - koji is the Fedora rpm building utility. It should not have any impact on Duplicati. Normally. Well, given that one of these packages is a selinux plugin, I’d say that there is a caveat here. I have seen report of bad or badly configured selinux plugins having unfortunate side effects even in permissive mode.
Anyway, the mere presence of these packages on your system is a mark of customized setups. These systems are not out-of-the-box distro Linux - and as such could exhibit very particular behaviour, difficult to debug generally. If this kind of problem don’t happen on a developer’s box, the only chance of solving it could be by trying to uninstall things more or less randomly until it works.
Using coredumpctl -1 info
, I see getpwuid in only the following:
Module libcom_err.so.2 with build-id c980b4303c51332b16b98f799a158445eae81aa0
Stack trace of thread 269982:
#0 0x00007fcfe148ec4c __pthread_kill_implementation (libc.so.6 + 0x8ec4c)
#1 0x00007fcfe143e9c6 raise (libc.so.6 + 0x3e9c6)
#2 0x00007fcfe14287f4 abort (libc.so.6 + 0x287f4)
#3 0x000056385fa2de0e mono_post_native_crash_handler (mono-sgen + 0x2de0e)
#4 0x000056385fa6b923 mono_handle_native_crash (mono-sgen + 0x6b923)
#5 0x000056385fabcbeb sigabrt_signal_handler (mono-sgen + 0xbcbeb)
#6 0x00007fcfe143ea70 __restore_rt (libc.so.6 + 0x3ea70)
#7 0x00007fcfe148ec4c __pthread_kill_implementation (libc.so.6 + 0x8ec4c)
#8 0x00007fcfe143e9c6 raise (libc.so.6 + 0x3e9c6)
#9 0x00007fcfe14287f4 abort (libc.so.6 + 0x287f4)
#10 0x00007fcfe142871b __assert_fail_base.cold (libc.so.6 + 0x2871b)
#11 0x00007fcfe1437576 __assert_fail (libc.so.6 + 0x37576)
#12 0x00007fcfe14902b0 __pthread_mutex_lock@GLIBC_2.2.5 (libc.so.6 + 0x902b0)
#13 0x00007fcfe1bd8736 sss_nss_mc_get_ctx (libnss_sss.so.2 + 0x3736)
#14 0x00007fcfe1bd91fe sss_nss_mc_getpwuid (libnss_sss.so.2 + 0x41fe)
#15 0x00007fcfe1bdb036 _nss_sss_getpwuid_r (libnss_sss.so.2 + 0x6036)
#16 0x00007fcfe14dc631 getpwuid_r@@GLIBC_2.2.5 (libc.so.6 + 0xdc631)
#17 0x00007fcfd541b905 Mono_Posix_Syscall_getpwuid_r (libMonoPosixHelper.so + 0x1b905)
#18 0x00000000413c5c79 n/a (n/a + 0x0)
#19 0x00000000413c4418 n/a (n/a + 0x0)
#20 0x0000000041500fab n/a (n/a + 0x0)
#21 0x00000000414fffa0 n/a (n/a + 0x0)
#22 0x00000000414ff3bc n/a (n/a + 0x0)
#23 0x00000000414fef73 n/a (n/a + 0x0)
#24 0x00000000414fca20 n/a (n/a + 0x0)
#25 0x00007fcfd792b6a1 System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (mscorlib.dll.so + 0x32b6a1)
#26 0x00007fcfd7790fca System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (mscorlib.dll.so + 0x190fca)
#27 0x00007fcfd7790dd3 System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (mscorlib.dll.so + 0x190dd3)
#28 0x00007fcfd792b54a System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_Run (mscorlib.dll.so + 0x32b54a)
#29 0x00007fcfd77cdf1b System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (mscorlib.dll.so + 0x1cdf1b)
#30 0x00007fcfd77c4fa3 System_Threading_Tasks_Task_FinishContinuations (mscorlib.dll.so + 0x1c4fa3)
#31 0x00007fcfd77c3541 System_Threading_Tasks_Task_FinishStageThree (mscorlib.dll.so + 0x1c3541)
#32 0x00007fcfd77b79e6 System_Threading_Tasks_Task_1_TResult_REF_TrySetResult_TResult_REF (mscorlib.dll.so + 0x1b79e6)
#33 0x00007fcfd7929ca3 System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_TResult_REF_SetResult_TResult_REF (mscorlib.dll.so + 0x329ca3)
#34 0x00000000411cc0db n/a (n/a + 0x0)
#35 0x00007fcfd792b6a1 System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (mscorlib.dll.so + 0x32b6a1)
#36 0x00007fcfd7790fca System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (mscorlib.dll.so + 0x190fca)
#37 0x00007fcfd7790dd3 System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (mscorlib.dll.so + 0x190dd3)
#38 0x00007fcfd792b54a System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_Run (mscorlib.dll.so + 0x32b54a)
#39 0x00007fcfd77cdf1b System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (mscorlib.dll.so + 0x1cdf1b)
#40 0x00007fcfd77c4fa3 System_Threading_Tasks_Task_FinishContinuations (mscorlib.dll.so + 0x1c4fa3)
#41 0x00007fcfd77c3541 System_Threading_Tasks_Task_FinishStageThree (mscorlib.dll.so + 0x1c3541)
#42 0x00007fcfd77b79e6 System_Threading_Tasks_Task_1_TResult_REF_TrySetResult_TResult_REF (mscorlib.dll.so + 0x1b79e6)
#43 0x00007fcfd77a4b0f System_Threading_Tasks_TaskCompletionSource_1_TResult_REF_TrySetResult_TResult_REF (mscorlib.dll.so + 0x1a4b0f)
#44 0x00007fcfd77a4b5f System_Threading_Tasks_TaskCompletionSource_1_TResult_REF_SetResult_TResult_REF (mscorlib.dll.so + 0x1a4b5f)
#45 0x0000000041306593 n/a (n/a + 0x0)
#46 0x00007fcfd7790dd3 System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (mscorlib.dll.so + 0x190dd3)
#47 0x00007fcfd7799738 System_Threading_QueueUserWorkItemCallback_System_Threading_IThreadPoolWorkItem_ExecuteWorkItem (mscorlib.dll.so + 0x199738)
#48 0x00007fcfd77978ba System_Threading_ThreadPoolWorkQueue_Dispatch (mscorlib.dll.so + 0x1978ba)
#49 0x00007fcfd779955d System_Threading__ThreadPoolWaitCallback_PerformWaitCallback (mscorlib.dll.so + 0x19955d)
#50 0x000000004123e137 n/a (n/a + 0x0)
#51 0x000056385fa3850a mono_jit_runtime_invoke (mono-sgen + 0x3850a)
#52 0x000056385fc3b06c do_runtime_invoke (mono-sgen + 0x23b06c)
#53 0x000056385fc68aa9 try_invoke_perform_wait_callback (mono-sgen + 0x268aa9)
#54 0x000056385fcb9dcb worker_thread (mono-sgen + 0x2b9dcb)
#55 0x000056385fc6793b start_wrapper_internal (mono-sgen + 0x26793b)
#56 0x00007fcfe148ce2d start_thread (libc.so.6 + 0x8ce2d)
#57 0x00007fcfe15121b0 __clone3 (libc.so.6 + 0x1121b0)
Can we escalate this to somebody who can read coredumps or stack traces? We do not seem to be getting any traction here, and I have already been a week without a backup.
Any chance Python 2 was removed? The updated to Python 3 occurred for both users and the timing fits.
Possibly similar (but certainly a different manifestation) to what has happened with MacOS after 12.3.
@jimkd1yv your issue has been escalated to the top of the forms list (for the moment), which is as good as it gets BTW.
Nope, both still on my Fedora 36 box:
# ls -l /usr/bin/python[23]
lrwxrwxrwx. 1 root root 9 Jun 17 04:38 /usr/bin/python2 -> python2.7
lrwxrwxrwx. 1 root root 10 Aug 3 13:20 /usr/bin/python3 -> python3.10
I also booted to single user mode, unmounted my user folders and forced a disk check on each of them. The very next backup attempt failed within seconds.
Also still on my Fedora 35 systems
ls -l /usr/bin/python[23]
lrwxrwxrwx. 1 root root 9 Jun 16 21:54 /usr/bin/python2 → python2.7
lrwxrwxrwx. 1 root root 10 Jun 9 20:07 /usr/bin/python3 → python3.10