-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds pipeline metrics 🔭 #1387
Adds pipeline metrics 🔭 #1387
Conversation
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excited to see this coming together, thanks Hrishikesh!
After this lands let me know if you'd like me to do some of the other metrics (taskruns, etc.)
@@ -156,6 +156,7 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error { | |||
c.Logger.Errorf("Failed to update TaskRun status for PipelineRun %s: %v", pr.Name, err) | |||
return err | |||
} | |||
_ = c.metrics.DurationAndCount(pr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since DurationAndCount
might take a long time (see comment below), have you considered running this in a goroutine so that Reconcile
is not blocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exciting 🕺
12a9afe
to
ec8a3cf
Compare
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
https://thenewstack.io/opentracing-opencensus-merge-into-a-single-new-project-opentelemetry/ I wonder we should start right away with opentelemetry, or if the new API is too young (it's still alpha)? |
@@ -28,7 +28,7 @@ import ( | |||
|
|||
const ( | |||
// ControllerLogKey is the name of the logger for the controller cmd | |||
ControllerLogKey = "controller" | |||
ControllerLogKey = "tekton" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary in this change? It seems useful to key messages from this binary as coming from the controller, as opposed to "Tekton" which could mean lots of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A valid argument,
issue: this name used in a metric name as a prefix
which I didn't like controller_***
.
Ideally, we should fix the limitation in knative pkg
or I'm open for any alternative suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so this "log key" is used both for logging, and for metrics, is that the case?
Maybe just file an issue with knative/pkg to make these separate, and link that here in a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware it's is used as log key as well
} | ||
|
||
// IsOfAPipeline return true if taskrun is a part of pipeline | ||
func (tr *TaskRun) IsOfAPipeline() (bool, map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a description of the returned map to the doc comment. And since it returns something more than just a simple bool it might make sense to rephrase the method name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should return the Pipeline name and PipelineRun name, instead of a map containing both of those strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better now?
|
||
if err != nil { | ||
r.initialized = false | ||
return r, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return a nil Recorder, instead of one that will error when asked to record anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to return the nil
, however everytime in before calling the metrics methods in controller flow we need to check for nil
, right? instead of doing that way we thought to add that check while calling metrics methods. Now trade here is the method has to check if the instance is actually initialized
which is either not that great but it avoid many if check scatter accross caller places.
Yes @imjasonh why not. I will address all review comments today and get this PR in shape (sorry I was for some time) |
@afrittoli thats the valid suggestion. We discussed it in the upstream call. It turned to be very young project but progressively we will make the transition over |
5764542
to
24afa9b
Compare
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM for me but commits will need to be squashed 😝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, just a couple more fixes/questions.
return errors.New("ignoring the metrics recording, failed to initialize the metrics recorder") | ||
} | ||
|
||
prs, err := lister.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run this in another goroutine, so that this doesn't contribute to reconciliation slowness. I'm glad it's coming from an in-memory cache, but this might still take long enough to matter, especially if some things aren't guaranteed to be in the cache.
pkg/reconciler/taskrun/metrics.go
Outdated
duration = tr.Status.CompletionTime.Sub(tr.Status.StartTime.Time) | ||
} | ||
|
||
taskName := "annonymous" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default value should be ""
. It's possible to name a Task anonymous
which would make this misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, exporter drops labels with empty values when I tried initially(will confirm it again).
We need it for when TaskRun
embeds Task specification.
kind: TaskRun
metadata:
generateName: home-volume-
spec:
taskSpec:
steps:
- name: write
image: ubuntu
command: ['bash']
...
|
||
status := "success" | ||
if tr.Status.Conditions[0].Status == corev1.ConditionFalse { | ||
status = "failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we might want to break this out to signal timeout
, cancelled
, and possibly future states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like yes, we may have to 🤔
7884ced
to
23c9067
Compare
The following is the coverage report on pkg/.
|
23c9067
to
856601c
Compare
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
bc3fbaf
to
34813dc
Compare
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-tekton-pipeline-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this got lgtm'ed and merged before addressing some of the comments. We can follow up on those later, but it would be great to have issues to track/assign that work.
|
||
status := "success" | ||
if pr.Status.Conditions[0].Status == corev1.ConditionFalse { | ||
status = "failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file an issue to track adding metrics for timeout/cancelled/etc runs?
duration = tr.Status.CompletionTime.Sub(tr.Status.StartTime.Time) | ||
} | ||
|
||
taskName := "anonymous" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file an issue to track a better solution for this?
In your previous comment you said the exporter drops labels with empty-string values, does that mean no data will be reported at all? Or just that it will be reported without a taskName
? Because the latter is probably what we want, assuming we can query for data points that don't report a taskName
.
Yes @imjasonh I'm creating the follow-up issue for it. 👍 |
Changes
Often, as a developer or administrator(ops) I want some insights
about pipeline behavior in terms of time taken to execute pipleinerun/taskrun,
its success or failure ratio, pod latencies etc.
At present tekton pipelines has very limited ways to surface such information
or it's hard to get those details looking at resources yamls.
This patch exposes above mentioned pipelines metrics on '/metrics'
endpoint using knative
pkg/metrics
package. User can collect suchmetrics using prometheus, stackdriver or other supported metrics system.
To some extent its solves
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Release Notes