-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collector: add tasks API collection #778
Conversation
This commit adds simple aggregation of Elasticsearch Tasks API. There are 4 new metrics; though 3 are just bookkeeping. elasticsearch_task_stats_action_total is a gague reporting the total number of tasks running for a given action. Because there are no stats endpoints available for this, this change introduces an aggregation step to group the number of tasks by action name. This metric is useful for ensuring long running actions of a specific kind stay within a specific limit. Of particular use to me is the action: 'indices:data/write/delete/byquery'. In my usecase, our ES access patterns mean we have a predefined limit of these actions running on the cluster. This change also adds two new CLI flags to manage the collection of tasks API: --es.tasks (to enable task collection) --es.tasks.actions (to filter tasks by action param) Issue prometheus-community#525 proposed addition of collection of these tasks. Signed-off-by: Aaron Delaney <apd@arista.com>
@sysadmind hey - would it be possible to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving towards the Collector interface for collectors. If you look at cluster_info and cluster_settings, you will see examples of the newer style. If that was the only change, I would say we could fix it up after merging, but with the other review remarks, I think it's worthwhile to do at the same time. Using the Collector interface means --es.tasks is no longer necessary. It will be --collector.tasks automatically. The --es.tasks.actions flag will still be necessary, but I don't think that the "es." prefix is necessary.
Overall, I think this is close and the logic seems to be correct.
collector/tasks_response.go
Outdated
Action string `json:"action"` | ||
} | ||
|
||
type AggregatedTaskStats struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there is a benefit to having this type. We could just pass the map instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the struct makes the code a bit more self documenting; it makes it clear we're doing aggregation and by what. If you'd prefer I can make that change though
Thanks for the prompt & useful review @sysadmind :)
I ended up moving the flag into tasks.go directly and using a global; because the collector factory cannot pass params during construction. The downside of directly gathering flags in the
I considered some alternatives:
If you have a better implementation please suggest. |
Signed-off-by: Aaron Delaney <apd@arista.com>
Signed-off-by: Aaron Delaney <apd@arista.com>
Signed-off-by: Aaron Delaney <apd@arista.com>
Hey - @sysadmind - any update here? I have addressed your most recent set of comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good except the metric name. Thanks for your patience.
Signed-off-by: Aaron Delaney <apd@arista.com>
Signed-off-by: Aaron Delaney <apd@arista.com>
It's understandable - maintaining OSS is a thankless & endless task and I appreciate the reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* collector: add tasks API collection This commit adds simple aggregation of Elasticsearch Tasks API. There are 4 new metrics; though 3 are just bookkeeping. elasticsearch_task_stats_action_total is a gague reporting the total number of tasks running for a given action. Because there are no stats endpoints available for this, this change introduces an aggregation step to group the number of tasks by action name. This metric is useful for ensuring long running actions of a specific kind stay within a specific limit. Of particular use to me is the action: 'indices:data/write/delete/byquery'. In my usecase, our ES access patterns mean we have a predefined limit of these actions running on the cluster. This change also adds two new CLI flags to manage the collection of tasks API: --es.tasks (to enable task collection) --es.tasks.actions (to filter tasks by action param) Issue prometheus-community#525 proposed addition of collection of these tasks. Signed-off-by: Aaron Delaney <apd@arista.com> * collector: use collector interface for tasks Signed-off-by: Aaron Delaney <apd@arista.com> * all: fix issues reported by golangci-lint Signed-off-by: Aaron Delaney <apd@arista.com> * collector: make task structs private to package Signed-off-by: Aaron Delaney <apd@arista.com> * Fix task stats metric name Signed-off-by: Aaron Delaney <apd@arista.com> * Fix tasks test Signed-off-by: Aaron Delaney <apd@arista.com> --------- Signed-off-by: Aaron Delaney <apd@arista.com>
This reverts commit 7a98b17.
This reverts commit 7a98b17.
This commit adds simple aggregation of Elasticsearch Tasks API.
There are 4 new metrics; though 3 are just bookkeeping.
elasticsearch_task_stats_action_total is a gague reporting the total
number of tasks running for a given action. Because there are no stats
endpoints available for this, this change introduces an aggregation step
to group the number of tasks by action name.
This metric is useful for ensuring long running actions of a specific
kind stay within a specific limit. Of particular use to me is
the action: 'indices:data/write/delete/byquery'.
In my usecase, our ES access patterns mean we have a predefined limit
of these actions running on the cluster.
This change also adds two new CLI flags to manage the collection of tasks API:
Issue #525 proposed addition of collection of these tasks.