-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic KeyError: 'runner' in reactor #61416
Comments
Similar to #52961 (salt 3000�) |
Looking at this, I'm almost positive that the issue, for some reason, is that the reaction_type isn't in It's entirely possible, though, that there's something else at play here with the interplay of more than one runner, the time it takes to actually run that runner, etc. If someone has some spare cycles to try and track that down, that would be .... oh nope, just figured it out! In salt/utils/reactor.py, it calls With my 2s timer, I added this:
100% of the time I see this error. So it looks like what's happening is between grabbing One potential fix would be to retrieve the runner from Anyway - thanks again for the report! |
We're seeing the same thing in v3002.2:
On average, i'd say 1% of the reactor calls runs into this race condition. In our case, that's mission critical. Given there's no follow up on this issue i can only assume it isn't fixed in v3005, though we will upgrade to a latest version soon. fwiw, i've not been able to reproduce it anymore after enabling
Looking at the reactor code i would say that's purely coincidental but we've not hit the exception anymore after a few thousand test runs. |
Started seeing this 2 days ago and now I have 900+ log messages, each throwing the following:
I have several events configured to run orchestrate runners and this happens with all of them. |
seen also: 3004.2 [ERROR ] Reactor 'handle_stats' failed to execute runner 'salt.cmd' happening all the time i also tried the retry approach from above - no observable effect |
@garethgreenaway This appears similar to #52961. I have additional logs available if required to dig into this further. |
I was able to fix this using @waynew's suggestion above. Here's a patch:
This should be expanded to include other file_clients as well. I would submit a PR but I still don't understand how salt's testing stack works, but maybe somebody can run with this. |
Description
3002 salt master on RHEL7. We have a reactor watching and alerting via email on failures. The reactor generally works and has generated 7 valid and 2 errors today. On Jan 01 it produced 3 valid and one error.
Python trace below
Setup
Reactor config (email addresses sanitized)
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
RHEL7 on-prem VM.
Steps to Reproduce the behavior
(Include debug logs if possible and relevant)
Expected behavior
No python errors reported in salt master log
Screenshots
Versions Report
The text was updated successfully, but these errors were encountered: