Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-introduce per-process CPU collection #1146

Merged
merged 5 commits into from
Dec 14, 2024

Conversation

blt
Copy link
Collaborator

@blt blt commented Dec 12, 2024

What does this PR do?

This commit re-introduces per-process CPU data from /proc/{pid}/stat reusing the
same naming scheme from the cgroup sourced data, offset with stat. to avoid
summing issues in aggregations.

I have removed much of the stat code from the main procfs sample loop. The data
was either no longer used or is duplicated elsewhere. Some data could be
re-introduced if we desire by extending the poll loop in stat.rs.

I am continuing to remove more and more of our procfs crate integration. I
think, ultimately, we should be able to parse directly without the need of a
third-party dependency. That removal is near done now.

@blt blt added the no-changelog label Dec 12, 2024 — with Graphite App
@blt blt marked this pull request as ready for review December 12, 2024 22:45
@blt blt requested a review from a team as a code owner December 12, 2024 22:45
@blt blt changed the base branch from blt/don_t_coerce_counters_to_floats_avoid_metrics_handles to graphite-base/1146 December 13, 2024 02:06
This commit re-introduces per-process CPU data from /proc/{pid}/stat reusing the
same naming scheme from the cgroup sourced data, offset with stat. to avoid
summing issues in aggregations.

I have removed much of the stat code from the main procfs sample loop. The data
was either no longer used or is duplicated elsewhere. Some data could be
re-introduced if we desire by extending the poll loop in stat.rs.

I am continuing to remove more and more of our procfs crate integration. I
think, ultimately, we should be able to parse directly without the need of a
third-party dependency. That removal is near done now.

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
@blt blt force-pushed the blt/re-introduce_per-process_cpu_collection branch from d71e2e6 to c5f8569 Compare December 13, 2024 02:06
@blt blt force-pushed the graphite-base/1146 branch from 3b7ce8e to 4344486 Compare December 13, 2024 02:06
@blt blt changed the base branch from graphite-base/1146 to main December 13, 2024 02:06
Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
@blt blt force-pushed the blt/re-introduce_per-process_cpu_collection branch from c5f8569 to 5001c72 Compare December 13, 2024 02:06
Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
@@ -65,6 +77,8 @@ impl Sampler {
clippy::cast_possible_wrap
)]
pub(crate) async fn poll(&mut self) -> Result<(), Error> {
let mut proccess_info: FxHashMap<i32, ProcessInfo> = FxHashMap::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something in the control flow here, but it looks like this needs to be held across poll calls.

Related, once it is held for the duration of lading, expiration may be an issue. PID reuse seems unlikely, but memory bloat is a concern in cases where there are many short lived processes.

Copy link
Collaborator Author

@blt blt Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I just noticed that myself. Commit b84ec4a fixes this.

Expiration was a problem previously as well that we didn't solve, so we're no worse than before. But it is a potential problem I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each poll would it be wrong to remove pid entries in the process_info map whose pids were not seen in the current poll to solve the expiration problem?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that would work. You could do it pretty cheaply by having two hashmaps, move entries from one to the other each loop and clear out the map that was being moved from.

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
@blt blt merged commit 9f09b9a into main Dec 14, 2024
22 checks passed
Copy link
Collaborator Author

blt commented Dec 14, 2024

Merge activity

  • Dec 13, 7:10 PM EST: A user merged this pull request with Graphite.

@blt blt deleted the blt/re-introduce_per-process_cpu_collection branch December 14, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants