Feature request: Add group creation time to group_by hash #3959

ccope · 2024-08-16T07:43:46Z

What did you do?

Alert fired, started flapping
Ops person manually silenced the alert
Ops person also acknowledged the alert in Opsgenie

What did you expect to see?

A new batch of alerts should not be grouped into an already resolved group

What did you see instead? Under which circumstances?

Alert resolved, alertmanager did not send a notification to Opsgenie due to Send resolved notification for silenced alerts #226
A week later, new hosts started alerting, but were grouped into the same already-acknowledged incident in Opsgenie
Full datacenter outage occurs due to missed alerts

Environment

Alertmanager version:

alertmanager, version 0.24.0 (branch: HEAD, revision: f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11)
  build user:       root@265f14f5c6fc
  build date:       20220325-09:31:33
  go version:       go1.17.8
  platform:         linux/amd64

The text was updated successfully, but these errors were encountered:

grobinson-grafana · 2024-08-16T09:42:00Z

Hi! 👋

I think the main issue here is that Alertmanager cannot close incidents if all alerts in a group are silenced.

When silencing alerts for an active incident, you need to take care and make sure the incident is closed in your IRM (Opsgenie). If you leave the incident open, and new alerts are sent from Alertmanager to the same incident, you may or may not get paged for them.

I also recommend checking your Opsgenie configuration, as it sounds like the incident might have been left open by mistake? This shouldn't happen as you should be paged at regular intervals for active incidents until they are resolved.

To answer some of your questions:

Add group creation time to group_by hash

This won't work I'm afraid. Consider the case where the system clock on two Alertmanager servers are out of sync by 1ns. You will have different group creation times on each Alertmanager server, creating duplicate incidents in your IRM.

A new batch of alerts should not be grouped into an already resolved group

Given it had been a week since the last alert was resolved, and I assume there were no other active alerts in the group during that time, Alertmanager would have created a new group for these new alerts. However, group keys are deterministic, and if a group is "re-opened" it will re-use the same group key. This is intentional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Add group creation time to group_by hash #3959

Feature request: Add group creation time to group_by hash #3959

ccope commented Aug 16, 2024

grobinson-grafana commented Aug 16, 2024

Feature request: Add group creation time to group_by hash #3959

Feature request: Add group creation time to group_by hash #3959

Comments

ccope commented Aug 16, 2024

grobinson-grafana commented Aug 16, 2024