Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard likely duplicate problem notifications via Notification#last_notified_state_per_user #9893

Merged

Conversation

Al2Klimov
Copy link
Member

fixes #4503

Commits

  1. Whenever a state notification is sent: track user.name=>checkable.state in Notification#last_notified_state_by_user
  2. On Notification#last_notified_state_by_user change: sync that over cluster
  3. Among the other checks like is-acked, matches-filter, etc. in Notification#BeginExecuteNotification(): in case of the same non-reminder problem already notified before drop the eff. dupl. notification
  4. On recovery: clear that state

Test

Tested with https://github.com/Al2Klimov/twintowers and following config in volumes/icinga2/master1/etc/icinga2/zones.d/master/my.conf

 for (i in range(100)) {
    object Host i {
        check_command = "dummy"
        enable_active_checks = false
        #check_interval = "dummy"
        #vars.dummy_state = {{ Math.random() < 0.01 ? 2 : 0 }}
        #vars.dummy_text = "-"
    }
}

 for (i in range(10)) {
apply Service i {
        check_command = "dummy"
        enable_active_checks = false
    assign where true
}
}

object User "me" {
    states = [ OK, Critical, Unknown ]
}

object NotificationCommand "noop" {
    command = [ "true" ]
}

apply Notification "test" to Host {
    command = "noop"
    users = [ "me" ]
    assign where true
}

apply Notification "test" to Service {
    command = "noop"
    users = [ "me" ]
    assign where true
}
  • 👍 Initial (passive) warning filtered out as configured
  • 👍 Initial critical notified as configured
  • 👍 Follow-up warning filtered out as configured
  • 👍 Follow-up critical filtered out due to this change
  • 👍 Happened while one master was down on 10/10 services, so HA works

@Al2Klimov Al2Klimov added the consider backporting Should be considered for inclusion in a bugfix release label Nov 7, 2023
@Al2Klimov Al2Klimov added this to the 2.15.0 milestone Nov 7, 2023
@cla-bot cla-bot bot added the cla/signed label Nov 7, 2023
@icinga-probot icinga-probot bot added area/configuration DSL, parser, compiler, error handling area/notifications Notification events enhancement New feature or request needs-sponsoring Not low on priority but also not scheduled soon without any incentive labels Nov 7, 2023
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comments below, please also document the new cluster events here.

lib/icinga/notification.ti Outdated Show resolved Hide resolved
lib/icinga/notification.cpp Outdated Show resolved Hide resolved
lib/icinga/clusterevents.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab removed the needs-sponsoring Not low on priority but also not scheduled soon without any incentive label Nov 22, 2023
@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from 40244e6 to 4b51ec9 Compare November 24, 2023 12:41
test/icinga-notification.cpp Outdated Show resolved Hide resolved
Application::GetTP().Start();
helper.n->BeginExecuteNotification(NotificationRecovery, nullptr, false, false, "", "");
Application::GetTP().Stop();
BOOST_CHECK(helper.called == 1u);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth adding this test here:

BOOST_CHECK(helper.n->GetLastNotifiedStatePerUser()->Get(helper.u->GetName()) == Empty);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant IMAO. It's only the bridge to (not) sending a notification, the latter is important and tested here.

@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from 4b51ec9 to 5adefce Compare November 24, 2023 16:22
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also deduplicate the remaining test cases.

test/icinga-notification.cpp Outdated Show resolved Hide resolved
Comment on lines 588 to 589
notification->GetLastNotifiedStatePerUser()->Set(params->Get("user"), state);
Notification::OnLastNotifiedStateOfUserUpdated(notification, params->Get("user"), state, origin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attributes and signals are named inconsistently ("per" vs. "of"). For what it's worth, my preference would be unifying towards "per".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guilty. To my defence: The whole thing contains all states, separately per user. But sometimes I want to update only the value of a particular user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't it make so much more sense if everything that's specifically related to syncing that attribute was also named like that attribute and not just somewhat similar? Actually, this affects OnLastNotifiedStatesCleared as well.

lib/icinga/notification.cpp Show resolved Hide resolved
lib/icinga/notification.cpp Outdated Show resolved Hide resolved
test/CMakeLists.txt Outdated Show resolved Hide resolved
test/icinga-notification.cpp Outdated Show resolved Hide resolved
test/icinga-notification.cpp Outdated Show resolved Hide resolved
test/icinga-notification.cpp Outdated Show resolved Hide resolved
test/icinga-notification.cpp Outdated Show resolved Hide resolved
test/icinga-notification.cpp Outdated Show resolved Hide resolved
@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch 2 times, most recently from 1055f29 to d72906b Compare December 7, 2023 14:18
@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from d72906b to c1fe30b Compare December 11, 2023 10:31
doc/19-technical-concepts.md Outdated Show resolved Hide resolved
lib/icinga/clusterevents.cpp Show resolved Hide resolved
@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from c1fe30b to 77a2494 Compare December 12, 2023 14:34
doc/19-technical-concepts.md Outdated Show resolved Hide resolved
@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from 77a2494 to b3b86a6 Compare December 12, 2023 14:55
yhabteab
yhabteab previously approved these changes Dec 12, 2023
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning ⚠️:

# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 1, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

[2023-12-12 16:17:21 +0100] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'icinga2!test!notify'.
[2023-12-12 16:17:21 +0100] notice/Notification: Not sending notifications for notification object 'icinga2!test!notify and user 'yhabteab': state 'Warning' does not match state filter: Critical, OK and Unknown.

Critical:

# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 2, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

[2023-12-12 16:18:35 +0100] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'icinga2!test!notify'.
[2023-12-12 16:18:35 +0100] information/Notification: Sending 'Problem' notification 'icinga2!test!notify' for user 'yhabteab'
[2023-12-12 16:18:35 +0100] information/Notification: Completed sending 'Problem' notification 'icinga2!test!notify' for checkable 'icinga2!test' and user 'yhabteab' using command 'noop'.

Critical -> Warning -> Critical:

# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 1, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 2, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

[2023-12-12 16:21:48 +0100] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'icinga2!test!notify'.
[2023-12-12 16:21:48 +0100] notice/Notification: Notification object 'icinga2!test!notify': We already notified user 'yhabteab' for a Critical problem. Likely after that another state change notification was filtered out by config. Not sending duplicate 'Critical' notification.

Recovery -> Critical:

# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 0, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

[2023-12-12 16:24:00 +0100] information/Notification: Completed sending 'Recovery' notification 'icinga2!test!notify' for checkable 'icinga2!test' and user 'yhabteab' using command 'noop'.


# curl -k -s -S -i -u root:icinga -H 'Accept: application/json' \
 -X POST 'https://localhost:5666/v1/actions/process-check-result' \
-d '{ "type": "Service", "filter": "service.name==\"test\"", "exit_status": 2, "plugin_output": "PING CRITICAL - Packet loss = 100%", "performance_data": [ "rta=5000.000000ms;3000.000000;5000.000000;0.000000", "pl=100%;80;100;0" ], "pretty": true }'

[2023-12-12 16:24:20 +0100] information/Notification: Completed sending 'Problem' notification 'icinga2!test!notify' for checkable 'icinga2!test' and user 'yhabteab' using command 'noop'.

n->SetUsersRaw(new Array({"jdoe"}), true);
n->SetTypeFilter(typeFilter);
n->SetStateFilter(stateFilter);
((ConfigObject*)n.get())->OnAllConfigLoaded(); // link Service
Copy link
Contributor

@julianbrost julianbrost Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this cast? ConfigObject::OnAllConfigLoaded() is virtual, so shouldn't this just call n->OnAllConfigLoaded() anyways?

And if a cast is actually necessary, we have static_pointer_cast/dynamic_pointer_cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config::OnAllConfigLoaded() is virtual, so shouldn't this just call n->OnAllConfigLoaded() anyways?

I suppose it's because Notification::OnAllConfigLoaded() isn't publicly accessible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if a cast is actually necessary, we have static_pointer_cast/dynamic_pointer_cast.

I don't get the point of these. I'm casting to base, not to derived.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any good reason why Notification::OnAllConfigLoaded() shouldn't be public if it implements a public virtual method of a base class? So unless there's a good reason, I'd just change the visibility and avoid strange casts altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! IMAO always the overriding method of the most derived class shall be called from external, none of the base methods. Protecting all ex. the base virtual one enforces this. OK, admittedly Notification is the most derived class. But if this changes for Notification (or any other class) in the future, one had to remember these rules and make all methods protected the new derived class overrides. To make some already public methods protected!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either I'm getting you wrong or what you're describing can't happen.

https://en.cppreference.com/w/cpp/language/virtual#In_detail:

If some member function vf is declared as virtual in a class Base, and some class Derived, which is derived, directly or indirectly, from Base, has a declaration for member function with the same [...]
Then this function in the class Derived is also virtual (whether or not the keyword virtual is used in its declaration) and overrides Base::vf (whether or not the word override is used in its declaration).

So Notification::OnAllConfigLoaded() is virtual automatically and if there was another derived class overriding it, it would still be used.

@@ -92,6 +93,8 @@ class Notification final : public ObjectImpl<Notification>
static String NotificationHostStateToString(HostState state);

static boost::signals2::signal<void (const Notification::Ptr&, const MessageOrigin::Ptr&)> OnNextNotificationChanged;
static boost::signals2::signal<void (const Notification::Ptr&, const String&, uint_fast8_t, const MessageOrigin::Ptr&)> OnLastNotifiedStatePerUserUpdated;
static boost::signals2::signal<void (const Notification::Ptr&, const MessageOrigin::Ptr&)> OnLastNotifiedStatesCleared;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also rename this to OnLastNotifiedStatePerUserCleared. Why? Because a simple git grep LastNotifiedStatePerUser will then lead you to all the places that affect this attribute.

@Al2Klimov Al2Klimov force-pushed the do-not-re-notify-if-filtered-states-don-t-change-4503 branch from ef153bc to 97cd05d Compare December 13, 2023 12:21
@julianbrost julianbrost changed the title Discard likely duplicate problem notifications via Notification#last_notified_state_by_user Discard likely duplicate problem notifications via Notification#last_notified_state_per_user Dec 13, 2023
@Al2Klimov Al2Klimov merged commit 953eeba into master Dec 13, 2023
25 checks passed
@Al2Klimov Al2Klimov deleted the do-not-re-notify-if-filtered-states-don-t-change-4503 branch December 13, 2023 15:13
@yhabteab yhabteab added the backported Fix was included in a bugfix release label Dec 18, 2023
@Al2Klimov Al2Klimov removed the consider backporting Should be considered for inclusion in a bugfix release label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration DSL, parser, compiler, error handling area/notifications Notification events backported Fix was included in a bugfix release cla/signed enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to track filtered/unfiltered notifications and do not re-notify if filtered states don't change
4 participants