You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not using a custom entrypoint in my runner image
Controller Version
0.27.4
Helm Chart Version
No response
CertManager Version
No response
Deployment Method
Helm
cert-manager installation
Unneeded for this bug.
Checks
This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
My actions-runner-controller version (v0.x.y) does support the feature
I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
I've migrated to the workflow job webhook event (if you using webhook driven scaling)
Resource Definitions
Unneeded for this bug.
To Reproduce
Gather a large number of metrics and inspect the /metrics endpoint for large negative values.
Describe the bug
We see large negative values in our bucket timings. Code bug in additional context
Describe the expected behavior
The "run duration" of a job should never be a large negative value
I believe that sometimes, startedTime or completedTime, are never parsed and are still the zero value. When they are the zero value, the returned value is 0-<now> which is very large and reports as a large negative number in the /metrics endpoint.
Checks
Controller Version
0.27.4
Helm Chart Version
No response
CertManager Version
No response
Deployment Method
Helm
cert-manager installation
Unneeded for this bug.
Checks
Resource Definitions
Unneeded for this bug.
To Reproduce
Describe the bug
We see large negative values in our bucket timings. Code bug in additional context
Describe the expected behavior
The "run duration" of a job should never be a large negative value
Whole Controller Logs
.
Whole Runner Pod Logs
.
Additional Context
I'm looking at the function here:
actions-runner-controller/pkg/actionsmetrics/event_reader.go
Line 226 in e0a7e14
The code ends in this
I believe that sometimes, startedTime or completedTime, are never parsed and are still the zero value. When they are the zero value, the returned value is
0-<now>
which is very large and reports as a large negative number in the /metrics endpoint.Proposed fix
Rather than this
Do this
And the same for
RunTime
.Then on the caller side, if the metric
QueueTime.IsZero()
is true, don't report the zero value to /metrics historgramThe text was updated successfully, but these errors were encountered: