Re-introduce per-process CPU collection #1146

blt · 2024-12-12T22:44:16Z

What does this PR do?

This commit re-introduces per-process CPU data from /proc/{pid}/stat reusing the
same naming scheme from the cgroup sourced data, offset with stat. to avoid
summing issues in aggregations.

I have removed much of the stat code from the main procfs sample loop. The data
was either no longer used or is duplicated elsewhere. Some data could be
re-introduced if we desire by extending the poll loop in stat.rs.

I am continuing to remove more and more of our procfs crate integration. I
think, ultimately, we should be able to parse directly without the need of a
third-party dependency. That removal is near done now.

blt · 2024-12-12T22:44:46Z

Re-introduce per-process CPU collection #1146 : 2 dependent PRs (#1147 , #1149 ) 👈 (View in Graphite)
Don't coerce counters to floats, avoid metrics handles #1144
Introduce k8s style millicore CPU data #1143
Remove procfs CPU percentage #1141
Read CPU percentage from cgroup data #1140
Correct a coercion bug in capture manager #1139
Avoid potential overflows #1138
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

This commit re-introduces per-process CPU data from /proc/{pid}/stat reusing the same naming scheme from the cgroup sourced data, offset with stat. to avoid summing issues in aggregations. I have removed much of the stat code from the main procfs sample loop. The data was either no longer used or is duplicated elsewhere. Some data could be re-introduced if we desire by extending the poll loop in stat.rs. I am continuing to remove more and more of our procfs crate integration. I think, ultimately, we should be able to parse directly without the need of a third-party dependency. That removal is near done now. Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

lading/src/observer/linux/procfs.rs

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

GeorgeHahn · 2024-12-13T21:12:21Z

lading/src/observer/linux/procfs.rs

@@ -65,6 +77,8 @@ impl Sampler {
        clippy::cast_possible_wrap
    )]
    pub(crate) async fn poll(&mut self) -> Result<(), Error> {
+        let mut proccess_info: FxHashMap<i32, ProcessInfo> = FxHashMap::default();


I might be missing something in the control flow here, but it looks like this needs to be held across poll calls.

Related, once it is held for the duration of lading, expiration may be an issue. PID reuse seems unlikely, but memory bloat is a concern in cases where there are many short lived processes.

No I just noticed that myself. Commit b84ec4a fixes this.

Expiration was a problem previously as well that we didn't solve, so we're no worse than before. But it is a potential problem I agree.

For each poll would it be wrong to remove pid entries in the process_info map whose pids were not seen in the current poll to solve the expiration problem?

No that would work. You could do it pretty cheaply by having two hashmaps, move entries from one to the other each loop and clear out the map that was being moved from.

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

blt · 2024-12-14T00:10:11Z

Merge activity

Dec 13, 7:10 PM EST: A user merged this pull request with Graphite.

This was referenced Dec 12, 2024

Avoid potential overflows #1138

Merged

Correct a coercion bug in capture manager #1139

Merged

Read CPU percentage from cgroup data #1140

Merged

Remove procfs CPU percentage #1141

Merged

Introduce k8s style millicore CPU data #1143

Merged

blt mentioned this pull request Dec 12, 2024

Don't coerce counters to floats, avoid metrics handles #1144

Merged

blt added the no-changelog label Dec 12, 2024 — with Graphite App

blt marked this pull request as ready for review December 12, 2024 22:45

blt requested a review from a team as a code owner December 12, 2024 22:45

blt mentioned this pull request Dec 13, 2024

Split reading of /proc/pid/updtime into module #1147

Merged

blt changed the base branch from blt/don_t_coerce_counters_to_floats_avoid_metrics_handles to graphite-base/1146 December 13, 2024 02:06

blt force-pushed the blt/re-introduce_per-process_cpu_collection branch from d71e2e6 to c5f8569 Compare December 13, 2024 02:06

blt force-pushed the graphite-base/1146 branch from 3b7ce8e to 4344486 Compare December 13, 2024 02:06

blt changed the base branch from graphite-base/1146 to main December 13, 2024 02:06

Use /proc/uptime tick data for timing information

5001c72

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

blt force-pushed the blt/re-introduce_per-process_cpu_collection branch from c5f8569 to 5001c72 Compare December 13, 2024 02:06

remove accidental global state in /proc/pid/stat sampler

0674f3f

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

goxberry reviewed Dec 13, 2024

View reviewed changes

lading/src/observer/linux/procfs.rs Outdated Show resolved Hide resolved

proccess -> process

be6cf91

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

GeorgeHahn reviewed Dec 13, 2024

View reviewed changes

make sure process_info persists

b84ec4a

Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>

blt mentioned this pull request Dec 13, 2024

Ensure cgroup CPU poller works with multiple cgroups #1149

Merged

GeorgeHahn approved these changes Dec 13, 2024

View reviewed changes

blt mentioned this pull request Dec 14, 2024

Allow users to configure capture metrics expiration #1150

Merged

blt merged commit 9f09b9a into main Dec 14, 2024
22 checks passed

blt deleted the blt/re-introduce_per-process_cpu_collection branch December 14, 2024 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-introduce per-process CPU collection #1146

Re-introduce per-process CPU collection #1146

blt commented Dec 12, 2024 •

edited

Loading

blt commented Dec 12, 2024 •

edited

Loading

GeorgeHahn Dec 13, 2024

blt Dec 13, 2024 •

edited

Loading

cmetz100 Dec 13, 2024

blt Dec 13, 2024

blt commented Dec 14, 2024

Re-introduce per-process CPU collection #1146

Re-introduce per-process CPU collection #1146

Conversation

blt commented Dec 12, 2024 • edited Loading

What does this PR do?

blt commented Dec 12, 2024 • edited Loading

GeorgeHahn Dec 13, 2024

Choose a reason for hiding this comment

blt Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

cmetz100 Dec 13, 2024

Choose a reason for hiding this comment

blt Dec 13, 2024

Choose a reason for hiding this comment

blt commented Dec 14, 2024

Merge activity

blt commented Dec 12, 2024 •

edited

Loading

blt commented Dec 12, 2024 •

edited

Loading

blt Dec 13, 2024 •

edited

Loading