-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
icinga 2.11 exits without errors on FreeBSD 11.3 #7539
Comments
@bsdlme can you confirm that behaviour please? I'm not sure how FreeBSD handles the umbrella process and reloads here. Or maybe it is a problem with boost asio & context on BSD specifically. |
@nielsk: does dmesg(1) show a SIGBUS error for the icinga2 process? |
I have the same problem, I described it a bit more in FreeBSD #240812. I discovered after running dmesg that icinga2 was diying with a SIGBUS. |
Is there a difference if you omit |
Nope. SIGSEGV in my case (at least I see a lot of signal 11 in my dmesg, thus this should be from my experiments getting it to work)
… On 25. Sep 2019, at 14:20, Lars E ***@***.***> wrote:
@nielsk: does dmesg(1) show a SIGBUS error for the icinga2 process?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@bsdlme which boost versions are provided with 11.2 & 3? |
1.71 is in ports |
I'm not a FreeBSD user, what else differs between 11.3 and 12 in terms of compiler versions, cmake, build flags, openssl versions, etc. in specific regard to Icinga dependencies? |
FreeBSD 11.2 (@mat813): clang 6.0.0, OpenSSL 1.0.2o 11.3 was released after 12.0 that's why it has a newer clang version. Cmake is not in base but installed from ports. Ports have the same version for all FreeBSD versions. The latest cmake in ports is cmake-3.15.3, probably used by all of us. CFLAGS are:
|
Thanks. As far as I can see, 11.x is still supported. https://www.freebsd.org/security/#sup @bsdlme How difficult is it for you to spin up 11.3 and test this? |
I can create a VM at Azure with 11.3. If you like I can give you the login credentials, so you can play around yourself. |
I would but unfortunately I have no time atm. I'm merely interested in the fact if you can reproduce this by yourself, and do the backtrace dance. I don't remember whether FreeBSD has gdb or lldb though. |
Now I set up a 11.3 amd64 VM and installed 2.11 using packages. |
So, what can I do to debug this further?
|
I tried now switching for icinga2 back to the freebsd-pkg-repo instead of our own and it still crashes with the same output as above |
You could try by using the sample config and adding more and more of your config and see when it starts to crash. |
It would be easier to set up a new server with an officially supported linux-distribution and migrate the config than doing this. I have to speak with my team about it. |
Or upgrade to 12.0-RELEASE. |
How? I am using 11.3. You cannot upgrade from 11.3 to 12.0 because of the new zfs-features in 11.3.
… On 1. Oct 2019, at 16:32, Lars E ***@***.***> wrote:
Or upgrade to 12.0-RELEASE.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I don't have much time atm, one thought is how the Boost libraries are compiled on your system. There could be specific hardening compiler flags which create troubles here, or specific stack guard patches which are wrong in the way how Boost Coroutine and Context work. See my analysis for the Nessus scan crashes in #7431. Since it always crashes on TLS connection start, this would be the place where I'd start debugging. Maybe also the OpenSSL version/linkage on FreeBSD causes trouble here. |
Oh, I see. Then you could upgrade to 12.1-BETA2 or wait for 12.1-RC1 which will be released on Oct, 11. |
Chiming in with the same problem FreeBSD 11.3 here. I just upgraded to 2.11.0 from 2.10.5, and now I'm also getting this SIGV (11), but I get the same thing from truss:
I'm using clang 8.0.0 but LibreSSL 2.9.2. This might at least point away from SSL being the culprit. I can confirm that this crash is only related to being a master, since one of my satellites is running the exact same build but hasn't crashed yet. @bsdlme - did you have any satellites connected to your test? I suspect that might be necessary so you can see the crash |
LibreSSL is something we don't support as the syscalls/APIs may behave differently. We only test OpenSSL. Is this a thing on FreeBSD to set via the ports package? I'm not sure how to interpret truss, but given that CLD_DUMPED leads to the real error here, is there a possibility to follow child forks? https://vegdave.wordpress.com/2006/10/23/an-example-on-running-truss/ says so. It may also help to attach gdb/lldb and follow the fork. |
You can trace child processes with "truss -f". |
LibreSSL is usually a drop in replacement for OpenSSL. We can set a knob when building packages to use that instead of openssl; I can provide more gory details on request. Note that I do not use the normal ports methodology of LibreSSL works 98% of the time; I've built 100s of packages with LibreSSL that work just fine with it including perl, php, nginx, and icinga2. Specific to icinga2, I have it running just fine with LibreSSL at two different sites for the past two years. That being said, there are a few edge case packages that do not build correctly with LibreSSL and these issues are (to my knowledge) handled by the ports system. I don't think the issue is the LibreSSL api because other users are using the stock OpenSSL api and having the same crash. So I just backed out to 2.10.5 because I needed it working. I can make some time to try 2.11 again if you are patient with me. :) |
Yes please. I'm not able to fix it, @dnsmichi is ENOTIME and we should really try to find the cause. Thanks in advance! |
So, I updated a 11.2 / i386 box to 12.0, and icinga crashes in the same way :( |
One minor sidenote, which probably doesn't apply to your implementation (I don't know what std::atomic_flag uses under the hood, but probably not the pthread_spin_* family): POSIX states that a pthread_spin_unlock called by a thread not owning the lock results in undefined behaviour[0] and could just as easily cause an abort, similar to what pthread_mutex_unlock does on OpenBSD. [0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_spin_unlock.html |
After a couple of days it still seems to run as expected. There's one minor issue where after some time icinga fails to exit when sending it a SIGTERM via: pkill -T "0" -xf "/usr/local/lib/icinga2/sbin/icinga2 daemon.*" as per OpenBSD's rc-framework. This however seems not directly related to this diff, since I can restart icinga just after a config-update has been pushed. I'll investigate further and if I find something useful I'll place it on an appropriate ticket. |
Is the patch included in the latest 2.12.1 release? |
#8308? Yes. |
@bsdlme With icinga2-2.12.1, it absolutely still crashes on startup on i386 boxes. |
Did you try i386 or x64? I just want to be sure before I do my test. |
Well, the answer is in the comment you are responding to, i386. I never had any problems on amd64. |
Thanks. I just wanted to be sure because I have seen people using i386 and x64 interchangeably. |
@bsdlme I tried to build it today on my poudriere (with a FreeBSD 11.4-jail) and it fails. I created a bug in the FreeBSD-bugzilla. |
@nielsk Yes, but you seem to have a local patch that can't be applied correctly. |
I could now update -- I had to upgrade to 11.4 because 11.3 is not supported anymore.
|
Too bad. So we're back at the beginning. |
Random thought I just had what might be causing this (did not investigate this further, just writing it down so I don't forget): Icinga 2.11 changed the network stack to use Boost.Asio and executes coroutines on multiple worker theads. AFAIK Boost.Asio may schedule these coroutines on arbitrary worker threads, thus if a coroutine holds a mutex while it performs a yield operation, the mutex might be unlocked on a different thread. |
Confirming this bug is still here after upgrading to 12.2 and 2021Q1 packages:
How can I tell icinga not to use multiple threads as a workaround?
|
No threads at all won't work, the best you could try would be to reduce the size of thread pools to 1, however that's not configurable at runtime. You'd have to patch the following line to say icinga2/lib/base/io-engine.cpp Line 87 in 2cb995e
And icinga2/lib/base/threadpool.hpp Line 39 in 2cb995e
|
That looks potentially dangerous. Still, I may try it in desperation. |
So I tried both of the changes suggested above and icinga2 still crashes. |
@nielsk And 12.x+? |
I don't know. I can't update the server because of this issue, I think I even tried once 12.x and had the same problem but it is quite a while ago. And I am migrating currently to checkmk. |
To all FreeBSD users subscribed here: Please could you test a recent Icinga 2 version on a recent FreeBSD version and report:
|
Hello,
I am running icinga2 version r2.13.1-1 (from packages) in a FreeBSD jail (12.2p11) without problems.
henning
… Am 19.01.2022 um 11:26 schrieb Alexander Aleksandrovič Klimov ***@***.***>:
To all FreeBSD users subscribed here:
Please could you test a recent Icinga 2 version on a recent FreeBSD version and report:
Icinga 2 version
FreeBSD version
Architecture
#7539 <#7539> occurs?
—
Reply to this email directly, view it on GitHub <#7539 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATU2IPWYLEQWS7BIU34ZTLUW2GW5ANCNFSM4I2K3HSA>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.
|
icinga2-2.13.2, FreeBSD 12.2-STABLE, amd64. I am running icinga2 at a remote site that works, but has never crashed. I am using the patch to boost libraries in this bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260143 which was thankfully provided to me by #9174. This installation is now having a swap space resource exhaustion issue, but that is probably different than the relevant issue. I could no longer use the icinga2 site that has crashed before so it has been retired. |
Describe the bug
After upgrading from icinga 2.10.5 to 2.11 on FreeBSD 11.3-p3, icinga2 daemon -C shows that the configuration is correct, but starts and immediately exits when the api-feature is enabled. It works without the api-feature.
After re-running the api setup, I got it working but it crashed when I tried to send a notification.
output from running truss icinga2 daemon -x debug before 'api setup'
crash
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): r2.11.0-1icinga2 feature list
): api checker command ido-mysql mainlog notificationicinga2 daemon -C
):additional context
I opened a thread on the community discourse where I might have wrote more:
https://community.icinga.com/t/problems-with-upgrading-icinga-2-10-5-to-2-11-on-freebsd/2325
The text was updated successfully, but these errors were encountered: