-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack Monitoring rule types failing due to empty buckets #120111
Comments
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
I believe this is the query that's failing to return data:
|
I did some investigation work and found one of the cloud instances that WE (Elastic) owns. I checked it out and confirmed they don't actually have any Stack Monitoring data in their cluster but they do have the rules for Stack Monitoring enabled. I also noticed clusters that looked to be production clusters shipping their monitoring data to a separate cluster. Their monitoring clusters are not in our errors but the production cluster is. I suspect that some how these alerts are being created on clusters that either had stack monitoring data at one point OR they were created by accident (probably in Stack Management). @jasonrhodes How do you want to handle this? Maybe we should add a check in the Stack Monitoring executor that throws a "No Data" error instead of throwing the exceptions above? |
Here are the steps to reproduce this:
To fix this we need to check there is data present before we start destructuring variables off the response object. Here is an example of where the error manifests in the disk usage library function: kibana/x-pack/plugins/monitoring/server/lib/alerts/fetch_disk_usage_node_stats.ts Line 115 in 903e75e
If you look at the errors in the alerts then check the corresponding library function under https://github.com/elastic/kibana/tree/main/x-pack/plugins/monitoring/server/lib/alerts , you should be able to find the line of code where it's destructuring the response object. In my opinion, I think these alerts should just do nothing when the data is missing instead of throw errors or "No Data" alerts. |
the log threshold rule has a similar problem right now in #119777 and we're still debating whether and how to communicate that situation to the user |
Here is a second way this could happen: #121129 |
I think we should fix the null pointer exception on this bug and then leave the rest of the decision (should we actually alert the user to this problem?) until later. Filling up the Kibana logs with these errors doesn't seem like a good solution, no matter what we want to do re: communicating to the user. |
The PR I merged only makes it fail silently with some grace but we still need to communicate that state to the user (to either populate with SM data or disable/delete the rules). |
…lastic#131332) * [Stack Monitoring] Prevent exceptions in rule when no data present (elastic#120111) * Remove optional chaining Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
I think there's an issue already for improving the "missing data" alert. Most of the other SM alerts have the same behavior, do nothing if the data is missing. |
Kibana version: 7.15.0
Describe the bug:
I'm seeing alot of Stack Monitoring rules failing on cloud with the following errors:
and a couple of variations on:
I'm assuming there is JS code that assumes aggregations are returned by Elasticsearch in their queries, but actually causing a null-pointer exception due to the data set being empty and the aggregation being omitted as a result.
I've seen this happen with:
So this is likely the same problem repeating across these all.
Acceptance Criteria
Notes:
The text was updated successfully, but these errors were encountered: