-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on subsequent API calls with permission filters #6874
Comments
#6875 looks the same to me |
Something is wrong indeed. We noticed that sending many requests to
Another (callees trimmed):
|
Which HTTP libs are you guys using for your scripts? Do you or the libs send the |
@lippserd My tests were using cURL without explicit parameters to keep the connection alive and there was one request per program invocation, so it'd be closed after one request anyway. I also tried using ab which wasn't successful in crashing Icinga, though considering that it was in a local test system it may have been related to the little workload compared to the staging machine where I gathered the stack traces. The Icinga version in question is r2.10.2-1 from the Ubuntu Xenial package version 2.10.2-1.xenial. |
@lippserd |
I'd suggest to wait for v2.10.3 and re-test with that version. |
@Al2Klimov What makes you believe that 2.10.3 would bring an improvement? |
v2.10.3 will bring a bunch of improvements, maybe there's also something for you. |
Could you guys test our snapshot packages? This may be fixed already. |
@lippserd you have a sort of guide to install the snapshot version? |
Go to https://icinga.com/download/ and select your distro, see the info under the 'Snapshot Builds' section. |
Unfortunately Icinga2 v2.10.2-160-g1c772aa installed on Ubuntu Xenial still crashes, this time in |
Same goes for Bionic 2.10.2+160.g1c772aac5.2019.01.12+1.bionic-0 I still get the same issue as in the original post. Is there any steps you want me to take now |
@lippserd I have installed 2.9.2-1.bionic and this seems to work without problems. Edit: our api calls do not work in 2.10.0 so could not properly test that. but 2.10.1 is also crashing |
My shot-in-the-dark guess is that it has something to do with changes introduced in #6596 and the fact that frame.Self does not get set within the else condition at https://github.com/Icinga/icinga2/pull/6596/files#diff-a500132058c49e3d780348743914943fR271 |
For clarity: I am a teammate of @jottekop so we are dealing with the same issue. My shot in the dark was wrong. I have compiled a debug build of the master branch on a test vm and ran it in gdb to get backtraces with some more info. I reconstructed part of the config in a test setup with just one API user and i could hammer it endlessly in an endless bash while loop with curl calls in the same sequence as we do in the dashboard that crashes the icinga API. The only way i could trigger this segfault was to transfer my debug binary to the prod box and run it in gdb there, where we have a fair volume of API calls coming in from various users. The api-users config looks like this:
It looks like two threads are deleting the same object at the same time. Thread 3123 and 3121 both are deleting an icinga object at 0x7fff88021b80 with a refcount of 0. My gdb session:
Attached is the full thread backtrace |
Thanks, that helps. Although such a scenario must not happen that both intrusive_ptrs decide to release the object. Namespaces were added recently, that's somewhere a programming error or a compiler optimization. 34de810 plays a role here, fixing a filter regression. Next week, my schedule is full but I'll try to look into it afterwards. |
@dnsmichi, do you have any update on this issue? |
No, unfortunately not. I am dealing with other customer issues this week. |
We're little short on resources at the moment but we try to help you as soon as possible. |
This would be totally ok, but for some reason the object already is deleted.
This originates from the namespace introduction in 2.10 in 34de810 where the frame was changed from a dictionary to a namespace. It may also be influenced with the changes from 7f7e81d I'm wondering whether The missing I'm off for now, will deal with this in CW11 the soonest again. Cheers, |
Its fairly easily reproducible for me so whatever info you need, let me know.
|
I assume this is related to same root causes as #6785 .
|
Hello @hrak, we need all of them. Which OS(es), how does the zone tree look like, all of your monitoring objects (per zone), all users (you seem to have already postet that) and the queries you're firing against the API. Best, |
If there's a lot to set up, I also accept a Dockerfile/docker-compose.yml (I scrolled over your public projects) which accepts self-built Icinga 2 packages. |
ref/NC/602611 |
ref/IP/13853 |
We're currently testing the snapshot 2.10.4+626 and are experiencing strange issues with API permission filters which basically block us from testing the snapshot version against our actual use-case (querying a lot of objects and creating downtimes via API):
with 2.10.4 we get the host object(s) which match the filter
I looks like the |
I'm already debugging it. |
|
Aah I see I should have tested the patch a bit more deeply. |
Thanks again. I'm not yet convinced that only the |
Here's my analysis, the PR is sane and puts everything into the current scope wherever needed. Snapshot packages will be available during the night. |
I have just tested the snapshot packages I had to roll back to 2.9.2 again however, because triggering a deployment in director would hang up the master API. The master process keeps doing all its other tasks it seems, but calls to the master API hang indefinitely (f.e. clicking on 'deployments' in icingaweb would result in a gateway timeout). I tested a curl to the master like |
Thanks for noticing, I see that too after pulling the snapshot packages in Vagrant. Will investigate, I have an idea already. |
-> Regression from #7150, I'll work on a PR fix. |
Fixed it, @hrak you're in the same timezone and likely home already, so snapshot packages will be built in roughly 6h. |
Tested with v2.10.4-658-g81075088f, its looking good now! Eagerly awaiting 2.11 :) |
I'm waiting for customer feedback, from my tests and yours I consider this being fixed. |
Thank you for all the effort. looking forward to the release of 2.11 |
We decided to move this into 2.10.5 some minutes ago, unless @marcofl reports otherwise. |
…y to allow proper parallel execution * fixes issue #6785 where permission checks get wrong result because permissions checks are done within a shared namespaces without using only unique keys * mitigates issue #6874 where segmentation faults occur because of concurrent access to non threadsafe parts of namespace (a fix for thread safety of namespaces which would be an alternative approach to get rid of these segfaults is out of scope of this fix as 6785 needs to be fixed anyway and this is the straight-forwards) way to fix that * do the same for eventqueue (not certain whether events can be processed in parallel but I expect it is the case) (cherry picked from commit 1e7cd4a)
I was unable to reproduce a crash, my dev environment is probably not large enough. My test involved a few ApiUsers with host group based permissions and Hosts with >10 groups, some matching some not. With them simultaneously deleting, inserting and querying those Hosts via a few forked curls. All tests done with debug builds.
|
Expected Behavior
We expect that icinga2 keeps running when we do 40ish api requests a second.
Current Behavior
Currently we run icinga2 client without problems. But as soon as we start doing GET requests on the master api to get hosts and services status. with 9 different users on 2 different endpoints. so 18 connections. the service fails with a segfault. Shown below.
There is not crash report or anything shown in the logs. if I turn on debug log it shows "segmentation fault".
Please let me know if you need more information.
Possible Solution
No solution yet.
Steps to Reproduce (for bugs)
Context
We are trying to setup a icinga2 cluster. and created our own dashboard to view open issues on some monitors in our office.
We have 10 Api users configured to make this happen all with separate filters.
Your Environment
icinga2 --version
): 2.10.2-1icinga2 feature list
): api checker debuglog ido-mysql mainlogicinga2 daemon -C
): succeeds without problems.We run a master with 14 satellites with around 200 nodes.
also we run a icingaweb on a separate server. and a ido-mysql on a separate server aswell.
The text was updated successfully, but these errors were encountered: