The meaning of the "samples" count #365

Jongy · 2021-03-21T21:48:22Z

Let's say I'm running py-spy record -o output -f raw --gil -d 3 -r 10 ... on an idle process. When done, py-spy will print happily:

py-spy> Wrote raw flamegraph data to 'output'. Samples: 30 Errors: 0

But reading the output file, it's actually empty. So why did it tell me it has 30 samples?

And on the other hand - should I run a multithreaded app, this time passing --idle, I'll get a graph with >30 samples.

In the "flamegraphs"/"stackcollapses" jargon, a sample is a single stack from a single thread. Linux's perf also uses "samples" this way. py-spy uses "samples" for "iterations of the profiling loop" - which is a number simply decided by duration * rate.

I think we should consider changing it, to be on-par with the naming used in other profilers. This "samples" count is misleading - it's not what you see after the profiling is done.

I thought it'd be a simple fix as:

diff --git a/src/main.rs b/src/main.rs
index 0435eb1..3358e97 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -251,7 +251,6 @@ fn record_samples(pid: remoteprocess::Pid, config: &Config) -> Result<(), Error>
             break;
         }
 
-        samples += 1;
         if let Some(max_samples) = max_samples {
             if samples >= max_samples {
                 exit_message = "";
@@ -287,6 +286,7 @@ fn record_samples(pid: remoteprocess::Pid, config: &Config) -> Result<(), Error>
             }
 
             output.increment(&trace)?;
+            samples += 1;
         }
 
         if let Some(sampling_errors) = sample.sampling_errors {

but that doesn't work out-of-the-box because we use the samples count to stop when reaching max_samples. So it'd require a few more changes I guess, which I'll happily do and open a PR, just tell me if you like this change @benfred .

The text was updated successfully, but these errors were encountered:

Jongy · 2021-03-21T22:16:47Z

diff --git a/src/main.rs b/src/main.rs
index 0435eb1..e4a5e38 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -215,6 +215,7 @@ fn record_samples(pid: remoteprocess::Pid, config: &Config) -> Result<(), Error>
     };
 
     let mut errors = 0;
+    let mut sampling_count = 0;
     let mut samples = 0;
     println!();
 
@@ -251,9 +252,9 @@ fn record_samples(pid: remoteprocess::Pid, config: &Config) -> Result<(), Error>
             break;
         }
 
-        samples += 1;
+        sampling_count += 1;
         if let Some(max_samples) = max_samples {
-            if samples >= max_samples {
+            if sampling_count >= max_samples {
                 exit_message = "";
                 break;
             }
@@ -268,6 +269,8 @@ fn record_samples(pid: remoteprocess::Pid, config: &Config) -> Result<(), Error>
                 continue;
             }
 
+            samples += 1;
+
             if config.include_thread_ids {
                 let threadid = trace.format_threadid();
                 trace.frames.push(Frame{name: format!("thread ({})", threadid),

seems to be enough, perhaps my naming could be improved haha (sampling_count??)

This number is more useful for the user (as opposed to "the number of sampling intervals", which we had until now). It matches the meaning of "samples" in other profiling tools, e.g flamegraphs, Linux's "perf", and others. We do not account samples whose recording is skipped: that is, GIL-less samples if --gil is given, idle samples unless --idle is given. Closes: benfred#365

This number is more useful for the user (as opposed to "the number of sampling intervals", which we had until now). It matches the meaning of "samples" in other profiling tools, e.g flamegraphs, Linux's "perf", and others. We do not account samples whose recording is skipped: that is, GIL-less samples if --gil is given, idle samples unless --idle is given. Closes: #365

Jongy mentioned this issue Mar 21, 2021

Count "samples" as the number of recorded stacks (per thread) #366

Merged

benfred closed this as completed in #366 Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The meaning of the "samples" count #365

The meaning of the "samples" count #365

Jongy commented Mar 21, 2021

Jongy commented Mar 21, 2021

The meaning of the "samples" count #365

The meaning of the "samples" count #365

Comments

Jongy commented Mar 21, 2021

Jongy commented Mar 21, 2021