You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the main issue here is that Alertmanager cannot close incidents if all alerts in a group are silenced.
When silencing alerts for an active incident, you need to take care and make sure the incident is closed in your IRM (Opsgenie). If you leave the incident open, and new alerts are sent from Alertmanager to the same incident, you may or may not get paged for them.
I also recommend checking your Opsgenie configuration, as it sounds like the incident might have been left open by mistake? This shouldn't happen as you should be paged at regular intervals for active incidents until they are resolved.
To answer some of your questions:
Add group creation time to group_by hash
This won't work I'm afraid. Consider the case where the system clock on two Alertmanager servers are out of sync by 1ns. You will have different group creation times on each Alertmanager server, creating duplicate incidents in your IRM.
A new batch of alerts should not be grouped into an already resolved group
Given it had been a week since the last alert was resolved, and I assume there were no other active alerts in the group during that time, Alertmanager would have created a new group for these new alerts. However, group keys are deterministic, and if a group is "re-opened" it will re-use the same group key. This is intentional.
What did you do?
What did you expect to see?
What did you see instead? Under which circumstances?
Environment
The text was updated successfully, but these errors were encountered: