-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monitoring][Additional-Alerting] Shard Size #74820
Comments
Pinging @elastic/stack-monitoring (Team:Monitoring) |
Apply only on the active shards - New data written in the last 7 days |
@ravikesarwani cc: @chrisronline @jasonrhodes I’m still trying to understand the UX flow and how we can provide useful feedback/notifications to the user. I know we have discussed some of these points over zoom, but I think it would be helpful if we address some of these questions here... - Do we give people the right levers for tuning this query? I consider this alert as a "per index" type of metric (where CPU for example is a "per node" type), however, each index pattern can have different tuning/ilm-policies. Which can result in different sharding/allocation behaviors. And, because of this I feel like it's also important to define the index pattern/s with their respective thresholds. - What are the right defaults for the values in this alert?
This fixed default doesn't really make sense (to me). Wouldn't it depend on the overall cluster size (and also the index's tuning/polices)? I don't know what the right number should be, but what makes sense to me is something dynamic/relative. So, for example (sudo): Should we still use the following threshold?
If so, what should be the default? Also, here is a snippet from the doc (for additional context): |
From a UI perspective I would like control on 2 parameters by the user:
Sharding strategy is a complex topic with many different parameters affecting how many shards are okay and the size. As a further check we talked about applying this alert only to Active index (new data written in last 7 days). This makes sure that large indexes in the environment that are "cold/frozen" and maybe larger than 55GB and users are okay with it is not something we alert on. This is something that we can provide a checkbox for user to control. Default should be checked but user can un-check it if they want. In that case alert would apply to all index, based on defined index pattern. |
@igoristic Thanks for raising these questions. Your points are valid one. If there's complexities you run into implementing the above let's raise it ASAP so that we can discuss and figure out alternative approaches. It would be good to get this out to the customers and continue to get feedback and improve based on real world scenarios and challenges the customers face. |
@ravikesarwani Thank you for addressing this! So, to clarify: In the drawer UI we will only have the following inputs:
And, we will only have 1x
We don't have this metric (as far as I'm aware of) |
I wonder if we should support index pattern inclusion and exclusion. ES supports this (using a |
Yes
I didn't have this in my definition. I don't think "How far back we want to look" really applies for shard size.
Yes
Yes I see the UI to look something like (other than the normal top and Actions section): Notify when shard size is over "55 GB" |
Supporting all the functionality around the creation of index pattern looks like is a complex task. See doc. |
I don't think we need to support the creation of index patterns, but we can support a simple text field to allow the user to specify them. I also don't mind if we source the list from known index patterns in Kibana. Either way, we can support inclusion and exclusion fairly easily if it's desirable from a product perspective. |
The doc starts out with...
What if they don't want to "explore" the production data/cluster they're monitoring? Seems like these are different use cases, and we should keep them separate. I think providing an index pattern field with a default value of |
I was assuming that using the already created index pattern in stack management should be easier. I agree the main use described in the document is for exploring the data in Kibana but it serves our purpose very well as well. That UI also has functionality like showing in real time when indexes are selected as you describe the filter which is real great and removes lots of user errors. If you think providing an edit box (include/exclude, wild card, multiple entries ...), error checking (characters allowed/not allowed etc.) and related functionality is more easier to develop then go for it. Hopefully you will use some existing class/functions and not try to reinvent the wheel here to make the code simple and test cases contained. cc: @jasonrhodes |
I imagine we'd do the same as Metrics does (as well as other observability solutions): They don't have any validation (afaik) and are just a simple text field. I'd imagine defaulting to |
Let's use the the text field and get this alert coded, tested and delivered. |
Acceptance Criteria
Current "Next step" items
.../elasticsearch/indices/${index}/advanced
(within the SM app)The text was updated successfully, but these errors were encountered: