Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add label propagation metrics #2076

Merged
merged 2 commits into from
Apr 14, 2023

Conversation

ruixiansong
Copy link
Contributor

/assign @swetharepakula

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @songrx1997. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 13, 2023
@ruixiansong ruixiansong force-pushed the label-propagation-metrics branch from 3e1ba5e to 0c140d0 Compare April 13, 2023 21:58
@swetharepakula
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2023
)

const (
ratioWithAnnotation = "ratio_endpoints_with_annotation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of ratio, lets say percentage

prometheus.CounterOpts{
Subsystem: negControllerSubsystem,
Name: totalLabelError,
Help: "The total number of errors of label propagation",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the total number of errors occurred for label propagation

prometheus.CounterOpts{
Subsystem: negControllerSubsystem,
Name: labelTruncationErrorCount,
Help: "The number of label truncation",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of labels truncated

Comment on lines 74 to 76
LabelTruncationErrorCount = prometheus.NewCounter(
prometheus.CounterOpts{
Subsystem: negControllerSubsystem,
Name: labelTruncationErrorCount,
Help: "The number of label truncation",
},
)

LabelFailureCount = prometheus.NewCounter(
prometheus.CounterOpts{
Subsystem: negControllerSubsystem,
Name: labelFailureCount,
Help: "The number of label truncation failures",
},
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these two metrics be combined and use a label to track truncation failures, truncated, and all labels added?

Comment on lines 34 to 40
EndpointsWithAnnotations = prometheus.NewGauge(
prometheus.GaugeOpts{
Subsystem: negControllerSubsystem,
Name: endpointWithAnnotation,
Help: "The number of endpoints with annotations",
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one lets name number_of_endpoints and then you would have a label that say with annotations.

This way we track total and number with annotations with a single metric

Comment on lines 42 to 48
RatioWithAnnotation = prometheus.NewGauge(
prometheus.GaugeOpts{
Subsystem: negControllerSubsystem,
Name: ratioWithAnnotation,
Help: "The ratio between the number of endpoints with annotation and the total number",
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to have this metric specifically. with the one above, we can calculate the ratio after the fact. For metrics, generally we try to stay away from doing any calculations when emitting/publishing them.

Comment on lines 50 to 46
AverageLabelNumber = prometheus.NewGauge(
prometheus.GaugeOpts{
Subsystem: negControllerSubsystem,
Name: averageLabelNumber,
Help: "The average number of labels per endpoint",
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be a distribution and we just emit the number of labels per endpoint. That way prometheus will do the calculation.

Comment on lines 58 to 56
AverageAnnotationSize = prometheus.NewGauge(
prometheus.GaugeOpts{
Subsystem: negControllerSubsystem,
Name: averageAnnotationSize,
Help: "The average size of endpoint annotations",
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, change this to a distribution

Comment on lines 33 to 52
// LabelPropagationStat contains stats related to label propagation.
type LabelPropagationStats struct {
EndpointCount int
EndpointWithAnnotation int
PodLabelCount int
AnnotationSize int
TotalLabelError int
LabelTruncationCount int
LabelFailureCount int
}

type LabelPropagationMetrics struct {
TotalLabelError int
LabelTruncationCount int
LabelFailureCount int
EndpointWithAnnotation int
RatioWithAnnotation float64
AverageLabelNum float64
AverageAnnotationSize float64
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the labels should go in the same file as the metrics that will use them

pkg/neg/metrics/neg_metrics_collector.go Show resolved Hide resolved
@ruixiansong ruixiansong force-pushed the label-propagation-metrics branch 2 times, most recently from ee00d32 to 497126d Compare April 13, 2023 23:56
Comment on lines 54 to 66
LabelNumber = prometheus.NewHistogram(
prometheus.HistogramOpts{
Subsystem: negControllerSubsystem,
Name: labelNumber,
Help: "The number of labels per endpoint",
},
)

AnnotationSize = prometheus.NewHistogram(
prometheus.HistogramOpts{
Subsystem: negControllerSubsystem,
Name: annotationSize,
Help: "The size of endpoint annotations per endpoint",
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add custom buckets. Otherwise the max bucket is 10 and anything over is not recorded.

For the annotation we should indicate that this is in bytes

@ruixiansong ruixiansong force-pushed the label-propagation-metrics branch 2 times, most recently from bef626e to 088e9b6 Compare April 14, 2023 01:09
@ruixiansong ruixiansong force-pushed the label-propagation-metrics branch from 088e9b6 to 213c56c Compare April 14, 2023 04:44
Copy link
Member

@swetharepakula swetharepakula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: songrx1997, swetharepakula

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2023
@k8s-ci-robot k8s-ci-robot merged commit 9d12b6e into kubernetes:master Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants