Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: Delete metrics on object delete #612

Merged
merged 1 commit into from
Aug 11, 2023
Merged

Conversation

darkowlzz
Copy link
Contributor

@darkowlzz darkowlzz commented Jul 31, 2023

The current metrics Record*() methods continues to export the metrics
of deleted objects unless the program restarts.

This change deletes the object metrics when the object is deleted. This ensures
that stale metrics about a deleted object is no longer exported.

As a result, the ConditionDelete is no longer needed. Another reason
to not have ConditionDelete is that a condition can only be one of
True, False or Unknown.

This introduces new delete methods in the low level metrics Recorder. In
the high level controller metrics, a list of owned finalizers is
introduced which is used to determine if an object is being deleted.
The existing Record*() methods are updated to check if the given object
is deleted, and call record or delete based on that. The user of this
API has to pass in the finalizer they write on object they maintain to
the metrics helper and record the metrics at the very end of the
reconciliation so that the final object state can be used to determine
if the metrics can be deleted safely.

To allow creating multiple instances of metrics helper, the metrics
collector registration is now done using a new function
MustMakeRecorder() which returns a metrics.Recorder. metrics.Recorder
can be used to create multiple metrics helpers with different attributes
if required, sharing the same underlying metrics recorder.

Before this, for a helmrepo and helmchart, the following metrics will continue to be exported even after deleting the helmrepo and helmchart:

# HELP gotk_reconcile_condition The current condition status of a GitOps Toolkit resource reconciliation.
# TYPE gotk_reconcile_condition gauge
gotk_reconcile_condition{kind="HelmChart",name="podinfo",namespace="default",status="False",type="Ready"} 0
gotk_reconcile_condition{kind="HelmChart",name="podinfo",namespace="default",status="True",type="Ready"} 1
gotk_reconcile_condition{kind="HelmChart",name="podinfo",namespace="default",status="Unknown",type="Ready"} 0
gotk_reconcile_condition{kind="HelmRepository",name="podinfo",namespace="default",status="False",type="Ready"} 0
gotk_reconcile_condition{kind="HelmRepository",name="podinfo",namespace="default",status="True",type="Ready"} 1
gotk_reconcile_condition{kind="HelmRepository",name="podinfo",namespace="default",status="Unknown",type="Ready"} 0
# HELP gotk_reconcile_duration_seconds The duration in seconds of a GitOps Toolkit resource reconciliation.
# TYPE gotk_reconcile_duration_seconds histogram
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="0.01"} 1
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="0.038363583488692544"} 1
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="0.1471764538093883"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="0.5646216173286169"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="2.166090855590701"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="8.309900738254731"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="31.879757075478317"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="122.30217221643493"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="469.19495946736544"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="1799.9999999999986"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmChart",name="podinfo",namespace="default",le="+Inf"} 2
gotk_reconcile_duration_seconds_sum{kind="HelmChart",name="podinfo",namespace="default"} 0.063555242
gotk_reconcile_duration_seconds_count{kind="HelmChart",name="podinfo",namespace="default"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="0.01"} 1
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="0.038363583488692544"} 1
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="0.1471764538093883"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="0.5646216173286169"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="2.166090855590701"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="8.309900738254731"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="31.879757075478317"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="122.30217221643493"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="469.19495946736544"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="1799.9999999999986"} 2
gotk_reconcile_duration_seconds_bucket{kind="HelmRepository",name="podinfo",namespace="default",le="+Inf"} 2
gotk_reconcile_duration_seconds_sum{kind="HelmRepository",name="podinfo",namespace="default"} 0.083907428
gotk_reconcile_duration_seconds_count{kind="HelmRepository",name="podinfo",namespace="default"} 2
# HELP gotk_suspend_status The current suspend status of a GitOps Toolkit resource.
# TYPE gotk_suspend_status gauge
gotk_suspend_status{kind="HelmChart",name="podinfo",namespace="default"} 0
gotk_suspend_status{kind="HelmRepository",name="podinfo",namespace="default"} 0

This persists for every object in the lifetime of the program.
With this change, all these metrics are deleted once the associated objects are deleted.

@darkowlzz darkowlzz added the area/runtime Controller runtime related issues and pull requests label Jul 31, 2023
@darkowlzz darkowlzz force-pushed the stale-metrics branch 2 times, most recently from b8e2ccb to 061ec43 Compare July 31, 2023 21:17
@darkowlzz darkowlzz force-pushed the stale-metrics branch 5 times, most recently from 9d45642 to f130fd7 Compare August 9, 2023 16:24
@darkowlzz darkowlzz marked this pull request as ready for review August 10, 2023 13:53
@darkowlzz darkowlzz force-pushed the stale-metrics branch 2 times, most recently from e5ac39e to f3f524b Compare August 10, 2023 13:56
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @darkowlzz

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much tidier on the other side of the implementation 🥇

Delete the object metrics when the object is deleted. This ensures
that stale metrics about a deleted object is no longer exported.

As a result, the `ConditionDelete` is no longer needed. Another reason
to not have `ConditionDelete` is that a condition can only be one of
True, False or Unknown.

This introduces new delete methods in the low level metrics Recorder. In
the high level controller metrics, a list of owned finalizers is
introduced which is used to determine if an object is being deleted.
The existing Record*() methods are updated to check if the given object
is deleted, and call record or delete based on that. The user of this
API has to pass in the finalizer they write on object they maintain to
the metrics helper and record the metrics at the very end of the
reconciliation so that the final object state can be used to determine
if the metrics can be deleted safely.

To allow creating multiple instances of metrics helper, the metrics
collector registration is now done using a new function in metrics
package called MustMakeRecorder() which returns a metrics.Recorder.
metrics.Recorder can be used to create multiple metrics helpers with
different attributes if required, sharing the same underlying metrics
recorder.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
@darkowlzz darkowlzz merged commit bff596a into main Aug 11, 2023
13 checks passed
@darkowlzz darkowlzz deleted the stale-metrics branch August 11, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime Controller runtime related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants