-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/hostmetrics] The value of process.cpu.utilization may exceed 1 #31368
Labels
Comments
BinaryFissionGames
added
bug
Something isn't working
needs triage
New item requiring triage
labels
Feb 21, 2024
Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Removing |
This was referenced Feb 27, 2024
djaglowski
added a commit
that referenced
this issue
Mar 12, 2024
…ss.cpu.utilization (#31378) **Description:** When calculating the process.cpu.utilization metric, values over 1 were possible since the number of cores was not taken into account (a single process may run on multiple logical cores, this effectively multplying the maximum amount of CPU time the process may take). This PR adds a division by the number of logical cores to the calculation for cpu utilization. **Link to tracking Issue:** Closes #31368 **Testing:** * Added some unit tests * Tested locally on my system with the program I posted in the issue: ```json { "name": "process.cpu.utilization", "description": "Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1. On the first scrape, no data point is emitted for this metric.", "unit": "1", "gauge": { "dataPoints": [ { "attributes": [{ "key": "state", "value": { "stringValue": "user" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.8811268516953904 }, { "attributes": [ { "key": "state", "value": { "stringValue": "system" } } ], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.0029471002907659667 }, { "attributes": [{ "key": "state", "value": { "stringValue": "wait" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0 } ] } } ``` In activity monitor, this process was clocking in around ~1000% - ~1100% cpu, on my machine that has 12 logical cores. So the value of around 90% total utilization seems correct here. **Documentation:** N/A --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
DougManton
pushed a commit
to DougManton/opentelemetry-collector-contrib
that referenced
this issue
Mar 13, 2024
…ss.cpu.utilization (open-telemetry#31378) **Description:** When calculating the process.cpu.utilization metric, values over 1 were possible since the number of cores was not taken into account (a single process may run on multiple logical cores, this effectively multplying the maximum amount of CPU time the process may take). This PR adds a division by the number of logical cores to the calculation for cpu utilization. **Link to tracking Issue:** Closes open-telemetry#31368 **Testing:** * Added some unit tests * Tested locally on my system with the program I posted in the issue: ```json { "name": "process.cpu.utilization", "description": "Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1. On the first scrape, no data point is emitted for this metric.", "unit": "1", "gauge": { "dataPoints": [ { "attributes": [{ "key": "state", "value": { "stringValue": "user" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.8811268516953904 }, { "attributes": [ { "key": "state", "value": { "stringValue": "system" } } ], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.0029471002907659667 }, { "attributes": [{ "key": "state", "value": { "stringValue": "wait" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0 } ] } } ``` In activity monitor, this process was clocking in around ~1000% - ~1100% cpu, on my machine that has 12 logical cores. So the value of around 90% total utilization seems correct here. **Documentation:** N/A --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
XinRanZhAWS
pushed a commit
to XinRanZhAWS/opentelemetry-collector-contrib
that referenced
this issue
Mar 13, 2024
…ss.cpu.utilization (open-telemetry#31378) **Description:** When calculating the process.cpu.utilization metric, values over 1 were possible since the number of cores was not taken into account (a single process may run on multiple logical cores, this effectively multplying the maximum amount of CPU time the process may take). This PR adds a division by the number of logical cores to the calculation for cpu utilization. **Link to tracking Issue:** Closes open-telemetry#31368 **Testing:** * Added some unit tests * Tested locally on my system with the program I posted in the issue: ```json { "name": "process.cpu.utilization", "description": "Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1. On the first scrape, no data point is emitted for this metric.", "unit": "1", "gauge": { "dataPoints": [ { "attributes": [{ "key": "state", "value": { "stringValue": "user" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.8811268516953904 }, { "attributes": [ { "key": "state", "value": { "stringValue": "system" } } ], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0.0029471002907659667 }, { "attributes": [{ "key": "state", "value": { "stringValue": "wait" } }], "startTimeUnixNano": "1708562810521000000", "timeUnixNano": "1708562890771386000", "asDouble": 0 } ] } } ``` In activity monitor, this process was clocking in around ~1000% - ~1100% cpu, on my machine that has 12 logical cores. So the value of around 90% total utilization seems correct here. **Documentation:** N/A --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
andrzej-stencel
added a commit
that referenced
this issue
May 6, 2024
…alizeProcessCPUUtilization` (#32502) **Description:** Switches the `receiver.hostmetrics.normalizeProcessCPUUtilization` feature gate to Beta, making it enabled by default. This is according to schedule described in the [docs](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.98.0/receiver/hostmetricsreceiver/README.md#feature-gates). **Link to tracking Issue:** - #31368 Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Component(s)
receiver/hostmetrics
What happened?
Description
The
process.cpu.utilization
metric is expected to be a value between 0 and 1, but it can be greater than 1.My guess here is that process.cpu.utilization does not properly divide by the number of system cores, so the value can actually be between 0 - ${num_cores}
Steps to Reproduce
Expected Result
No metrics exceed 1 (in fact, I'd expect the sum of all processes to be < 1)
Actual Result
I'm getting a metric of 9.5 (this process was taking ~950% cpu in my activity monitor)
Collector version
v0.94.0
Environment information
Environment
OS: macOS 14.2.1
Compiler(if manually compiled): go 1.22
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
I used this quick go program to quickly generate load:
The text was updated successfully, but these errors were encountered: