Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

times. begin/end ignored in HA cluster setup #4995

Closed
darmagan opened this issue Feb 10, 2017 · 13 comments
Closed

times. begin/end ignored in HA cluster setup #4995

darmagan opened this issue Feb 10, 2017 · 13 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/notifications Notification events bug Something isn't working

Comments

@darmagan
Copy link

darmagan commented Feb 10, 2017

Hi,

in HA Setup, the times {begin & end) in a notification object are ignored(I get the notification as soon as its changing to hard state). See logs. And I should get the notifications in a 60s interval(service), but I just get it one time.

Config Master:

notifications.conf:

apply Notification "mail-icingaadmin" to Host {
  import "mail-host-notification"

  user_groups = host.vars.notification.mail.groups
times = {
begin = 6m
}

  assign where host.vars.notification.mail
}

apply Notification "mail-icingaadmin" to Service {
  import "mail-service-notification"

  user_groups = host.vars.notification.mail.groups
times = {
begin = 6m
}
  if (match("ssh*", service.name)) {
    interval = 30s
  } else {
    interval = 60s
  }

  assign where host.vars.notification.mail
}

Icinga2 Logs:

[2017-02-08 14:53:17 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.15/s (189/min 941/5min 2674/15min);
[2017-02-08 14:53:32 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.13333/s (188/min 940/5min 2717/15min);
[2017-02-08 14:53:39 +0100] information/ApiListener: New client connection from [127.0.0.1]:37714 (no client certificate)
[2017-02-08 14:53:39 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:37714, user: root)
[2017-02-08 14:53:47 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 945/5min 2772/15min);
[2017-02-08 14:53:53 +0100] information/Notification: Sending reminder 'Problem' notification 'demo-host!mail-icingaadmin for user 'icingaadmin'
[2017-02-08 14:53:53 +0100] information/Notification: Completed sending 'Problem' notification 'demo-host!mail-icingaadmin' for checkable 'demo-host' and user 'icingaadmin'.
[2017-02-08 14:54:02 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.21667/s (193/min 947/5min 2820/15min);
[2017-02-08 14:54:17 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.21667/s (193/min 945/5min 2867/15min);
[2017-02-08 14:54:32 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.21667/s (193/min 936/5min 2910/15min);
[2017-02-08 14:54:36 +0100] information/ApiListener: New client connection from [127.0.0.1]:37718 (no client certificate)
[2017-02-08 14:54:36 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:37718, user: root)
[2017-02-08 14:54:47 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.2/s (192/min 948/5min 2964/15min);
[2017-02-08 14:55:02 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2017-02-08 14:55:02 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 940/5min 2978/15min);
[2017-02-08 14:55:17 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 947/5min 2986/15min);
[2017-02-08 14:55:32 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 950/5min 2991/15min);
[2017-02-08 14:55:47 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.11667/s (187/min 948/5min 2996/15min);
[2017-02-08 14:56:02 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.15/s (189/min 950/5min 2993/15min);
[2017-02-08 14:56:17 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 950/5min 2967/15min);
[2017-02-08 14:56:32 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.15/s (189/min 949/5min 2971/15min);
[2017-02-08 14:56:47 +0100] information/IdoMysqlConnection: Query queue items: 0, query rate: 3.16667/s (190/min 948/5min 2977/15min);

/var/mail/icinga:

***** Icinga  *****

Notification Type: PROBLEM

Service: random-002
Host: demo-host
Address: 127.0.0.1
State: WARNING

Date/Time: 2017-02-08 14:46:08 +0100

Additional Info: Hello from icinga2b

Comment: [] =

From icinga@icinga2a.localdomain  Wed Feb  8 14:47:08 2017
Return-Path: <icinga@icinga2a.localdomain>
X-Original-To: icinga@localhost
Delivered-To: icinga@localhost.localdomain
Received: by icinga2a.localdomain (Postfix, from userid 997)
	id 41D36802EC72; Wed,  8 Feb 2017 14:47:08 +0100 (CET)
Date: Wed, 08 Feb 2017 14:47:08 +0100
To: icinga@localhost.localdomain
Subject: PROBLEM - demo-host - random-002 is WARNING
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Message-Id: <20170208134708.41D36802EC72@icinga2a.localdomain>
From: icinga@icinga2a.localdomain (icinga)

***** Icinga  *****

Notification Type: PROBLEM

Service: random-002
Host: demo-host
Address: 127.0.0.1
State: WARNING

Date/Time: 2017-02-08 14:47:08 +0100

Additional Info: Hello from icinga2b

Comment: [] =

From icinga@icinga2a.localdomain  Wed Feb  8 14:47:08 2017
Return-Path: <icinga@icinga2a.localdomain>
X-Original-To: icinga@localhost
Delivered-To: icinga@localhost.localdomain
Received: by icinga2a.localdomain (Postfix, from userid 997)
	id 45B0C81F7ABB; Wed,  8 Feb 2017 14:47:08 +0100 (CET)
Date: Wed, 08 Feb 2017 14:47:08 +0100
To: icinga@localhost.localdomain
Subject: PROBLEM - demo-host - dns icinga.org is CRITICAL
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Message-Id: <20170208134708.45B0C81F7ABB@icinga2a.localdomain>
From: icinga@icinga2a.localdomain (icinga)

***** Icinga  *****

Notification Type: PROBLEM

Service: dns icinga.org
Host: demo-host
Address: 127.0.0.1
State: CRITICAL

Date/Time: 2017-02-08 14:47:08 +0100

Additional Info: DNS CRITICAL - expected '185.11.254.83; 127.0.0.1' but=
 got '185.11.254.90'

Comment: [] =

From icinga@icinga2a.localdomain  Wed Feb  8 14:53:53 2017
Return-Path: <icinga@icinga2a.localdomain>
X-Original-To: icinga@localhost
Delivered-To: icinga@localhost.localdomain
Received: by icinga2a.localdomain (Postfix, from userid 997)
	id 9D01681F7ABB; Wed,  8 Feb 2017 14:53:53 +0100 (CET)
Date: Wed, 08 Feb 2017 14:53:53 +0100
To: icinga@localhost.localdomain
Subject: PROBLEM - demo-host is DOWN
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Message-Id: <20170208135353.9D01681F7ABB@icinga2a.localdomain>
From: icinga@icinga2a.localdomain (icinga)

***** Icinga  *****

Notification Type: PROBLEM

Host: demo-host
Address: 127.0.0.1
State: DOWN

Date/Time: 2017-02-08 14:53:53 +0100

Additional Info: trtgrt

Comment: [] =

Regards

@dnsmichi
Copy link
Contributor

As discussed offline, more details such as the debug log entries are required. The secondary notification which triggers a reminder notification would obviously mean that the initial notification was skipped somehow.

@dnsmichi dnsmichi added needs feedback We'll only proceed once we hear from you again area/notifications Notification events area/distributed Distributed monitoring (master, satellites, clients) labels Feb 10, 2017
@dnsmichi
Copy link
Contributor

Any updates?

@gunnarbeutner gunnarbeutner added the bug Something isn't working label Feb 14, 2017
@darmagan
Copy link
Author

as I enabled the debug mode and reduced the notification interval to 1m (it was standing on default, not 60s) the notifications worked fine. So I can not reproduce the problem anymore. So I think I will close the Issue, but I need to do some testing to be sure.

@dnsmichi
Copy link
Contributor

Please check that this affects a checkable object which is checked on the left side, but the notification object is triggered and executed on the right side (feedback from @lippserd).

@dnsmichi dnsmichi reopened this Feb 14, 2017
@darmagan
Copy link
Author

darmagan commented Feb 15, 2017

I tested like this:

I have one host object for both masters (.10/.20). Set Icinga2a to down in Icingaweb2(node icinga2b). And in the debug logs I get:
"Not sending reminder notifications for notification object 'icinga2a-host!mail-icingaadmin': before specified begin time (2 minutes)"

debug logs:
debug.log.tar.gz

I've started ~15:23 with the notifications.

Setup:
icinga2a config master 192.168.33.10
icinga2b 192.168.33.20

#---zones.conf (both nodes)---#

object Endpoint "icinga2a" {
  host = "192.168.33.10"
}

object Endpoint "icinga2b" {
  host = "192.168.33.20"
}

object Zone "master" {
  endpoints = [ "icinga2a","icinga2b"]
}

/*
 * Global zone for templates
 */
object Zone "global-templates" {
  global = true
}

#---global-templates/notifications.conf---#

apply Notification "mail-icingaadmin" to Host {
  import "mail-host-notification"

  user_groups = host.vars.notification.mail.groups
interval = 0
times = {
begin = 6m
}

  assign where host.vars.notification.mail
}

apply Notification "mail-icingaadmin" to Service {
  import "mail-service-notification"

  user_groups = host.vars.notification.mail.groups
interval = 0
times = {
begin = 6m
}
  if (match("ssh*", service.name)) {
    interval = 30s
  } else {
    interval = 60s
  }

  assign where host.vars.notification.mail
}

@dnsmichi
Copy link
Contributor

Please use code blocks for better formatting (3 backticks each).

Can you please summarize what your tests now mean for the reported problem?

@darmagan
Copy link
Author

The Problem is that I don't get any notification for the down host. Icinga2 is waiting for the 2 minutes as it states in the logs "Not sending reminder notifications for notification object 'icinga2a-host!mail-icingaadmin': before specified begin time (2 minutes)". But after that message I don't get any notification or log message regarding that host.

Without the time.begin rows the notifications are working fine.

I've got some comments regarding this Issue, stating that the notifications work again if you change the notification name, but that's not the case, at lest for version 2.6.1. I've tested that.

I can reproduce the Issue, so if you need more info or a test with a modified setup let me know.

@darmagan
Copy link
Author

I know now why I've got no notifications. I wasn't testing correctly. The host recovered before the 2 minute limit. I set the check_interval to 3 minutes and it's working now. I have tested it with disabling the notification module on one node and it's working, too. The other node does the sending.

So I'm in the same situation as before, I'm not able to reproduce the problem. I will close this Issue.

@darmagan darmagan reopened this Mar 28, 2017
@darmagan
Copy link
Author

Hi,

I must reopen this Issue. I've tested with two new fresh VMs and I'm able to reproduce the problem now.

How to reproduce:
-setup master HA cluster
-create host object in master zone
-define the notification object like below
-test

I've tested with different combinations (with times.begin, without default assign rule and so on). There is a scenario where the custom notifications over Icingaweb2 are not send but logged in the logs. As I rename the notification object the notifications are working again.

You can see the irregularities if you compare the mail log with the debug log(attached).

root@box17:/etc/icinga2/zones.d/global-templates# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty
root@box17:/etc/icinga2/zones.d/global-templates# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.6.2-1)

notifications.conf(last state)

/**
 * The example notification apply rules.
 *
 * Only applied if host/service objects have
 * the custom attribute `notification` defined
 * and containing `mail` as key.
 *
 * Check `hosts.conf` for an example.
 */

apply Notification "mail-icingaadmin-01" to Host {
  import "mail-host-notification"

  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  times = {
     begin = 1m
  }
  interval = 0
  states = [ Down ]
  types = [ Problem ]


#  assign where host.vars.notification.mail
  assign where host.name == "master_box17" ||  host.name == "master_box18"
}

apply Notification "mail-icingaadmin" to Service {
  import "mail-service-notification"

  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  interval = 0
  assign where host.vars.notification.mail
}

zones.conf

*
 * Endpoint and Zone configuration for a cluster setup
 * This local example requires `NodeName` defined in
 * constants.conf.
 */

object Endpoint "box17.int.netways.de" {
  host = "192.168.33.27"
}
object Endpoint "box18.int.netways.de" {
  host = "192.168.33.28"
}

object Zone "master" {
  endpoints = [ "box18.int.netways.de", "box17.int.netways.de"]
}

/*
 * Defines a global zone containing templates,
 * etc. synced to all nodes, if they accept
 * configuration. All remote nodes need
 * this zone configured too.
 */

object Zone "global-templates" {
  global = true
}

box17.tar.gz
box18.tar.gz

@dnsmichi
Copy link
Contributor

Comparing to the original issue description, this one seems a little bit different.

It would be nice if you could extract your observation from the logs and post it here too in the future.
It is hard to follow a specific notification object, which also has been renamed in between if you don't tell about its name.

box17

debug.box17:[2017-03-28 13:22:17 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': type 'Recovery' does not match type filter: Problem.
debug.box17:[2017-03-28 13:22:28 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:22:28 +0200] information/Notification: Sending 'Problem' notification 'master_box18!mail-icingaadmin for user 'icingaadmin'
debug.box17:[2017-03-28 13:22:28 +0200] information/Notification: Completed sending 'Problem' notification 'master_box18!mail-icingaadmin' for checkable 'master_box18' and user 'icingaadmin'.
debug.box17:[2017-03-28 13:23:33 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:23:33 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': type 'Recovery' does not match type filter: Problem.
debug.box17:[2017-03-28 13:24:00 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:24:00 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': before specified begin time (1 minute)
debug.box17:[2017-03-28 13:27:04 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:27:04 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': type 'Recovery' does not match type filter: Problem.
debug.box17:[2017-03-28 13:30:49 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:30:49 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': before specified begin time (1 minute)
debug.box17:[2017-03-28 13:31:58 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
debug.box17:[2017-03-28 13:31:58 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': type 'Custom' does not match type filter: Problem.
debug.box17:[2017-03-28 13:33:15 +0200] notice/NotificationComponent: Attempting to send reminder notification 'master_box18!mail-icingaadmin-01'
debug.box17:[2017-03-28 13:33:15 +0200] notice/Notification: Attempting to send reminder notifications for notification object 'master_box18!mail-icingaadmin-01'.
debug.box17:[2017-03-28 13:33:15 +0200] information/Notification: Sending reminder 'Problem' notification 'master_box18!mail-icingaadmin-01 for user 'icingaadmin'
debug.box17:[2017-03-28 13:33:15 +0200] information/Notification: Completed sending 'Problem' notification 'master_box18!mail-icingaadmin-01' for checkable 'master_box18' and user 'icingaadmin'.
debug.box17:[2017-03-28 13:33:16 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-01'.
debug.box17:[2017-03-28 13:33:16 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin-01': type 'Custom' does not match type filter: Problem.
debug.box17:[2017-03-28 13:34:06 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-01'.
debug.box17:[2017-03-28 13:34:06 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin-01': type 'Custom' does not match type filter: Problem.
debug.box17:[2017-03-28 13:36:08 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-01'.
debug.box17:[2017-03-28 13:36:08 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin-01': type 'Recovery' does not match type filter: Problem.

Sending custom notifications as a test case is invalid here, as they are forced and ignore the type and state filters (and so are times and other filters).

13:22:17 until 13:31:58 contains a mix of custom notifications and recovery notifications which are filtered away. Nothing which proves a bug here.

Then the notification object is renamed and therefore Icinga 2 restarted in 13:33:15. The following log line looks suspicious, there must have been an event beforehand to initially attempt to send notifications. Renaming the notification object should have caused a new notification which

debug.box17:[2017-03-28 13:33:15 +0200] notice/NotificationComponent: Attempting to send reminder notification 'master_box18!mail-icingaadmin-01'

There's also a mismatch between box17 and box18 here, as the logs for box18 introduce a new notification object called 'master_box18!mail-icingaadmin-02`.

debug.box18:[2017-03-28 13:11:42 +0200] information/Notification: Completed sending 'Custom' notification 'master_box18!mail-icingaadmin-02' for checkable 'master_box18' and user 'icingaadmin'.
debug.box18:[2017-03-28 13:13:42 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-02'.
debug.box18:[2017-03-28 13:13:42 +0200] information/Notification: Sending 'Custom' notification 'master_box18!mail-icingaadmin-02 for user 'icingaadmin'
debug.box18:[2017-03-28 13:13:42 +0200] information/Notification: Completed sending 'Custom' notification 'master_box18!mail-icingaadmin-02' for checkable 'master_box18' and user 'icingaadmin'.
debug.box18:[2017-03-28 13:14:04 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-02'.
debug.box18:[2017-03-28 13:14:04 +0200] information/Notification: Sending 'Custom' notification 'master_box18!mail-icingaadmin-02 for user 'icingaadmin'
debug.box18:[2017-03-28 13:14:04 +0200] information/Notification: Completed sending 'Custom' notification 'master_box18!mail-icingaadmin-02' for checkable 'master_box18' and user 'icingaadmin'.
debug.box18:[2017-03-28 13:14:21 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin-02'.
debug.box18:[2017-03-28 13:14:21 +0200] information/Notification: Sending 'Problem' notification 'master_box18!mail-icingaadmin-02 for user 'icingaadmin'
debug.box18:[2017-03-28 13:14:21 +0200] information/Notification: Completed sending 'Problem' notification 'master_box18!mail-icingaadmin-02' for checkable 'master_box18' and user 'icingaadmin'.

Timestamps do not match box17, so it is unclear what exactly happened here in your test setup.

TL;DR - please explain in detail step by step

  • the current behaviour, with log entries from each timestamp, step by step
  • the expected behaviour

Furthermore query the REST API endpoint /v1/objects/notifications for 'master_box18!mail-icingaadmin-01' on both nodes and extract the object attributes. "paused" highlights which box feels responsible for triggering notifications.

And please avoid renaming objects to provide a clear strategy on how to reliably reproduce the issue.

@darmagan
Copy link
Author

Hi,

ok, that explains why I couldn't send custom notifications.

I have tested again and I get no notifications.

Infos:

extracted entries from box17 debug.log (no send entries):

[2017-03-30 15:41:47 +0200] notice/Checkable: State Change: Checkable master_box17 hard state change from UP to DOWN detected.
[2017-03-30 15:41:47 +0200] information/Checkable: Checking for configured notifications for object 'master_box17'
[2017-03-30 15:41:47 +0200] debug/Checkable: Checkable 'master_box17' has 1 notification(s).
[2017-03-30 15:41:47 +0200] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2017-03-30 15:41:47 +0200] notice/ApiListener: Relaying 'event::CheckResult' message
[2017-03-30 15:41:47 +0200] notice/ApiListener: Relaying 'event::SendNotifications' message
[2017-03-30 15:41:47 +0200] notice/ApiListener: Sending message 'event::SendNotifications' to 'box18.int.netways.de'
[2017-03-30 15:41:47 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message
[2017-03-30 15:41:47 +0200] notice/ApiListener: Sending message 'event::SetForceNextNotification' to 'box18.int.netways.de'
[2017-03-30 15:41:47 +0200] notice/JsonRpcConnection: Received 'event::SetForceNextNotification' message from 'box18.int.netways.de'
[2017-03-30 15:41:47 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message

/etc/icinga2/zones.d/global-templates/notifications.conf

/**
 * The example notification apply rules.
 *
 * Only applied if host/service objects have
 * the custom attribute `notification` defined
 * and containing `mail` as key.
 *
 * Check `hosts.conf` for an example.
 */

apply Notification "mail-icingaadmin" to Host {
  import "mail-host-notification"

  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  times = {
     begin = 1m
  }
  interval = 0
  states = [ Down ]
  types = [ Problem ]


#  assign where host.vars.notification.mail
  assign where host.name == "master_box17" ||  host.name == "master_box18"
}

apply Notification "mail-icingaadmin" to Service {
  import "mail-service-notification"

  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  interval = 0
  assign where host.vars.notification.mail
}

/etc/icinga2/zones.d/master/hosts.conf

object Host "master_box17" {
  import "generic-host"

  address = "192.168.33.27"
  check_command = "hostalive"
 vars.notification["mail"] = {
    /* The UserGroup `icingaadmins` is defined in `users.conf`. */
    groups = [ "icingaadmins" ]
  }

}

apply Service "master_box17_dummy" {
  import "generic-service"

  check_command = "dummy"

  assign where host.name == "box17.int.netways.de"

}

object Host "master_box18" {
  import "generic-host"

  address = "192.168.33.28"
  check_command = "hostalive"
 vars.notification["mail"] = {
    /* The UserGroup `icingaadmins` is defined in `users.conf`. */
    groups = [ "icingaadmins" ]
  }

}

apply Service "master_box18_dummy" {
  import "generic-service"

  check_command = "dummy"

  assign where host.name == "box18.int.netways.de"

}

box18 is flagged true:
box18:

root@box18:~# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_cluster",
                "paused": true,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}
root@box18:~# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box17!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box17!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box17",
                "interval": 0.0,
                "last_notification": 1490880691.456017,
                "last_problem_notification": 1490880691.456017,
                "name": "mail-icingaadmin",
                "next_notification": 1490796147.462089,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_cluster",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box17!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}

box17:

root@box17:/etc/icinga2/zones.d/master# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}
root@box17:/etc/icinga2/zones.d/master# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}

Regards

debug.logs.tar.gz

@dnsmichi
Copy link
Contributor

dnsmichi commented Mar 30, 2017

Boils down, that both instances already exchanged the details about sending a problem notification to the "icingaadmin" user, somehow at least.

box17

                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "package": "_cluster",
                "paused": true,
                "states": [
                    "Down"
                ],
                "times": {
                    "begin": 60.0
                },

box18

                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "paused": false,
                "states": [
                    "Down"
                ],
                "times": {
                    "begin": 60.0
                },

last_problem_notification is the same timestamp, I'd verify that in the logs.

notified_problem_users holds icingaadmin as value, so there must have been a problem notification beforehand. Maybe that got filtered away by the strict type filter your notification has.

no_more_notifications is set to true, box17 thinks that it already sent a notification and therefore won't send any.

box17 feels responsible for sending notifications, paused is set to false.

So, that above and below is what I initially expected in your issue report.

michi@mbmif ~/Downloads/debug.logs $ grep -r 'master_box18!mail-icingaadmin' box17.debug
box17.debug:[2017-03-30 15:45:31 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
box17.debug:[2017-03-30 15:45:31 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': before specified begin time (1 minute)
box17.debug:[2017-03-30 15:48:35 +0200] notice/Notification: Attempting to send  notifications for notification object 'master_box18!mail-icingaadmin'.
box17.debug:[2017-03-30 15:48:35 +0200] notice/Notification: Not sending  notifications for notification object 'master_box18!mail-icingaadmin': type 'Recovery' does not match type filter: Problem.

There is no further attempt in sending a notification. Which would indicate that the first attempt already set no_more_notifications but also notified_problem_users. Fairly impossible, the notification was skipped beforehand. I guess that happens because normally "Recovery" notifications would reset those values, but since you filter them away, there's no such reset.

I'd try again with more sane default filters, like type = [ Problem, Recovery ] and states = [ Down, Up ].

box18 does nothing in this regard.

Remember the timestamp for last_problem_notification?

michi@mbmif ~/Downloads/debug.logs $ date -r 1490880966
Thu Mar 30 15:36:06 CEST 2017

Log entries for that specific notification are not available, so it is hard to guess.

One thing which seems odd is the next_notification attribute being smaller than the last_notification timestamp.

Please re-check again and do the following:

  • adjust your notification apply rule to allow Recovery notification types (type = [ Problem, Recovery ] and states = [ Down, Up ])
  • query the REST API notifications endpoint before the first notification attempt, afterwards and after the recovery message is sent (add the selected details from above and explain them)
  • verify that notified_problem_users is reset on Recovery notification
  • look for next_notification and last_notification and compare the values (hint: maybe next_notification isn't properly initialized and prevents further notifications)

@darmagan
Copy link
Author

before notification:

root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}
root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box17!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box17!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box17",
                "interval": 0.0,
                "last_notification": 1490880691.456017,
                "last_problem_notification": 1490880691.456017,
                "name": "mail-icingaadmin",
                "next_notification": 1490796147.462089,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": true,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box17!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}

after process check results to down(before recovery):

root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490880966.9609,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490880966.960638,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}
root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box17!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box17!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box17",
                "interval": 0.0,
                "last_notification": 1490880691.456017,
                "last_problem_notification": 1490880691.456017,
                "name": "mail-icingaadmin",
                "next_notification": 1490796147.462089,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": true,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box17!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}

after Recovery notification 12:51:06

root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box17!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box17!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box17",
                "interval": 0.0,
                "last_notification": 1490880691.456017,
                "last_problem_notification": 1490880691.456017,
                "name": "mail-icingaadmin",
                "next_notification": 1490796147.462089,
                "no_more_notifications": true,
                "notification_number": 0.0,
                "notified_problem_users": [
                    "icingaadmin"
                ],
                "original_attributes": null,
                "package": "_etc",
                "paused": true,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box17!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}
root@box17:/etc/icinga2/zones.d/global-templates# curl -k -s -u root:root 'https://localhost:5665/v1/objects/notifications/master_box18!mail-icingaadmin' | python -m json.tool
{
    "results": [
        {
            "attrs": {
                "__name": "master_box18!mail-icingaadmin",
                "active": true,
                "command": "mail-host-notification",
                "command_endpoint": "",
                "ha_mode": 0.0,
                "host_name": "master_box18",
                "interval": 0.0,
                "last_notification": 1490957466.565347,
                "last_problem_notification": 1490880966.9609,
                "name": "mail-icingaadmin",
                "next_notification": 1490957775.64703,
                "no_more_notifications": false,
                "notification_number": 0.0,
                "notified_problem_users": [],
                "original_attributes": null,
                "package": "_etc",
                "paused": false,
                "period": "24x7",
                "service_name": "",
                "states": [
                    "Down",
                    "Up"
                ],
                "templates": [
                    "mail-icingaadmin",
                    "mail-host-notification"
                ],
                "times": {
                    "begin": 60.0
                },
                "type": "Notification",
                "types": [
                    "Problem",
                    "Recovery"
                ],
                "user_groups": [
                    "icingaadmins"
                ],
                "users": null,
                "vars": null,
                "version": 0.0,
                "zone": "master"
            },
            "joins": {},
            "meta": {},
            "name": "master_box18!mail-icingaadmin",
            "type": "Notification"
        }
    ]
}

on box18 after the recovery no_more_notifications switched to false and paused was always on false. So the recovery reset the no_more_notifications flag.
box17 was inactive the whole time if I look on the flags.

Regardless I haven't got any notifications, except the recovery notification on box17.

debug.logs.tar.gz

Regards

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Apr 11, 2017
@lippserd lippserd removed their assignment Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/notifications Notification events bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants
@dnsmichi @gunnarbeutner @lippserd @darmagan and others