-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notification not sent for object leaving downtime and already being in a hard not-ok state #6787
Comments
Where exactly is the downtime event and then the missing notification? I don't know where to look at in the provided logs. Especially interesting would be a test case with debug log enabled to see more details. |
I'll extract the requested lines from the log above:
The notification that was sent results from In the following screenshots you can see the notifications on Monday where no downtime was involved. Two different notifications are sent. Operator (blue) and Automic (red). I'll see if I can set up a test case on the dev cluster and supplement this with debug logs. |
Getting debug logs from prod turns out to be tricky but I set up a fresh environment and tried to reproduce the behaviour. Surprisingly the notifications behaved differently. Here's the Icingaweb 2 history: The configuration:
The complete debug log after I removed all IdoMysqlConnection, InfluxdbWriter and ApiListener events: Edit: It's weird that two events with "This notification was not sent out to any contact" were fired as if Icinga thought there were problem notifications and now needs to send the recovery but can't find a matching contact. |
Thanks, that works better with a shorter debug log. The flow is like this ... Here's the downtime being added, when the service is in an OK state.
Then an external check result changes state to SOFT critical.
Then there's another one which moves this to HARD critical.
At this timestamp Note: The notification At some point, the downtime is removed.
Then a recovery is forced via HARD OK change. This happens at a later point, but since no-one was notified before, those recoveries won't hit any users.
Final conclusion:
From the current implementation and design that downtimes cast a "don't let notifications being sent" window for a specified period of time, this works as expected. I do understand the requirement that |
What you've concluded makes sense to me and explains why the Automic notification is missing as it has an interval of 3600. If the service would stay critical for an additional 45 minutes to hit the next 3600 interval it would be sent. It also explains why in the test setup no notifications are sent at all. The Operator notification in my original post on the other hand is sent after the downtime ends but has an interval of 0. That shouldn't happen right? |
I wasn't able to fetch those debug logs yet but I'm working on it. In the meantime I found another situation which looks like it might be related. |
This gets even worse when you use We have several levels of escalations like the following 1st and 2nd level:
Depending on the timing of the downtime we end up with a notification to our 2nd level before our 1st level. After reading the explanation of @dnsmichi I understand why this happening but I think it really should not happen this way. |
@dnsmichi Do you still need feedback? |
This one is a dup of #5919, isn't it? |
@Al2Klimov Sort of, @lippserd & myself decided to leave this open as it provides quite a few more details. |
Implemented as part of #7270 and now merged to git master. |
It seems there is a problem in Icinga 2 or somewhere in my configuration that leads to some notifications not being sent if a downtime ends and the affected object, in this case a service, is already in a hard not-ok state.
Expected Behavior
All contacts receive a notification if the object is in a not-ok hard state and the downtime ends.
Current Behavior
Only one of multiple contacts receives the notification.
Steps to Reproduce (for bugs)
Notification configuration ("Operator 24x7 Service Notification" is sent, "Automic 24x7 Service Notification" is not). Outside this scenario both notifications work as expected.
Relevant parts of the log file:
Your Environment
icinga2 --version
): r2.10.1-1icinga2 feature list
): api checker ido-mysql influxdb mainlog notificationicinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.The text was updated successfully, but these errors were encountered: