Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading perf results of Linux perf 6.x #179

Open
hzeller opened this issue Oct 17, 2024 · 7 comments
Open

Reading perf results of Linux perf 6.x #179

hzeller opened this issue Oct 17, 2024 · 7 comments

Comments

@hzeller
Copy link
Member

hzeller commented Oct 17, 2024

Last time I used the perf_data_converter it worked well with Linux perf 5 something.

Now, the oldest version I can install on my machine is

$ perf --version
perf version 6.1.112

... but perf_to_profile is not happy with it:

$ perf_to_profile -i perf.data -o foobaz
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 152 bytes of metadata: HEADER_CPU_TOPOLOGY
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_CACHE
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 8148 bytes of metadata: HEADER_CACHE
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_SAMPLE_TIME
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 16 bytes of metadata: HEADER_SAMPLE_TIME
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_MEM_TOPOLOGY
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 56 bytes of metadata: HEADER_MEM_TOPOLOGY
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_BPF_PROG_INFO
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 4 bytes of metadata: HEADER_BPF_PROG_INFO
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_BPF_BTF
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 4 bytes of metadata: HEADER_BPF_BTF
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event UNKNOWN_EVENT_82
[INFO:src/quipper/perf_reader.cc:1087] Number of events stored: 9916
[INFO:src/quipper/perf_parser.cc:288] Parser processed: 8 MMAP/MMAP2 events, 2 COMM events, 32 FORK events, 33 EXIT events, 9835 SAMPLE events, 9615 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms], build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms]_text, build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms]_stext, build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:863] Guessing main mapping for PID=1 bpf_trampoline_6442493094
[WARNING:src/perf_data_handler.cc:688] stat: missing_sample_mmap 220/9835

The resulting output is not useful to pprof, e.g. top just emits an empty line.

What is the oldest supported version of perf ? Or rather: what would it take to adapt it to the latest perf format ?

@s-kanev
Copy link
Contributor

s-kanev commented Oct 21, 2024

These warnings looks fairly harmless to be fair.

This line

[INFO:src/quipper/perf_parser.cc:288] Parser processed: 8 MMAP/MMAP2 events, 2 COMM events, 32 FORK events, 33 EXIT events, 9835 SAMPLE events, 9615 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped

suggests there should be 9k events correctly parsed and mapped.

Can you attach a perf.data file so we can reproduce and narrow it down? FWIW, internally we're using perf 6.10 and it's not causing problems.

@hzeller
Copy link
Member Author

hzeller commented Oct 21, 2024

$ perf --version
perf` version 6.6.57
# perf_to_profille compiled from head ( f76cd4dd1e85bb54d60ea3fe69f92168fdf94edb)
# pprof compiled from fa3061bff0bcf0d611f07dbdba73665bd2bbac97
# switch off  address randomization with echo 0 > /proc/sys/kernel/randomize_va_space

perf record -Fmax -- <commandline>
# perf report shows
# 15 cpu_atom/cycles:Pu/                                                                                                                                                                                 ▒
# 82K cpu_core/cycles:Pu/

command line or the program does not really matter, but here in a personal project perf report also shows a reasonable use of functions

  33.23%  bant     bant                  [.] bant::Scanner::Next()
   9.03%  bant     bant                  [.] bant::Scanner::HandleString()
   8.44%  bant     bant                  [.] bant::Parser::Impl::ParseExpression(bool)
   5.13%  bant     bant                  [.] bant::query::(anonymous namespace)::TargetFinder::VisitFunCall(bant::FunCall*)
   5.06%  bant     bant                  [.] std::_Rb_tree<bant::BazelTarget, bant::BazelTarget, std::_Identity<bant::BazelTarget>, std::less<bant::BazelTarget>, std::allocator<bant::BazelTarget> >::_
   4.15%  bant     bant                  [.] bant::query::(anonymous namespace)::TargetFinder::VisitAssignment(bant::Assignment*)
   4.09%  bant     bant                  [.] bant::Parser::Impl::ParseValueOrIdentifier(bool)
   3.48%  bant     bant                  [.] bant::Parser::Impl::ParseList(bant::List*, std::function<bant::Node* ()> const&, bant::TokenType)
   3.37%  bant     libc.so.6             [.] __memcmp_avx2_movbe
...

Yet with the same perf.data (attached as perf.data.gz ), pprof
can't seem to find anything useful.

Converting perf.data to a profile.proto... (May take a few minutes)
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 152 bytes of metadata: HEADER_CPU_TOPOLOGY
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_CACHE
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 8148 bytes of metadata: HEADER_CACHE
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_SAMPLE_TIME
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 16 bytes of metadata: HEADER_SAMPLE_TIME
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_MEM_TOPOLOGY
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 56 bytes of metadata: HEADER_MEM_TOPOLOGY
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_BPF_PROG_INFO
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 4 bytes of metadata: HEADER_BPF_PROG_INFO
[INFO:src/quipper/perf_reader.cc:1340] Unsupported metadata type, skipping: HEADER_BPF_BTF
[WARNING:src/quipper/perf_reader.cc:1351] Skipping 4 bytes of metadata: HEADER_BPF_BTF
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:src/quipper/perf_reader.cc:1096] Skipping unsupported event UNKNOWN_EVENT_82
[INFO:src/quipper/perf_reader.cc:1087] Number of events stored: 82559
[INFO:src/quipper/perf_parser.cc:288] Parser processed: 8 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT events, 82380 SAMPLE events, 81683 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms], build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms]_text, build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:142] Using the build id found for the file name: [kernel.kallsyms]_stext, build id: c59b7d060bd4c73de067c8c7ca28fe174080d9b5.
[INFO:src/perf_data_handler.cc:863] Guessing main mapping for PID=1 bpf_trampoline_6442493094
File: bant
perf-version:6.6.57
perf-command:/nix/store/9sl0cb3m71m1syzirflsf1lr9jqfki0h-perf-linux-6.6.57/bin/.perf-wrapped record -Fmax -- bazel-bin/bant/bant -C ../bant-testing/ parse -v
Type: dummy:u_event
No samples were found with the default sample value type.
Try "sample_index" command to analyze different sample values.
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 0, 0% of 0 total
      flat  flat%   sum%        cum   cum%
(pprof)

Maybe it sis the difference between the counters cpu_atom/cycles (essentially none) and cpu_core/cycles (where the data is) perf records ?

15 cpu_atom/cycles:Pu/                                                                                                                                                                                 ▒
82K cpu_core/cycles:Pu/

@gmarin13
Copy link
Collaborator

15 cpu_atom/cycles:Pu/ ▒
82K cpu_core/cycles:Pu/

Is this one of the Intel hybrid CPUs? If yes, then you are probably seeing only samples from the first PMU.
In the shell try sample_index=cpu_core/cycles:Pu/_event, or sample_index=3 and see if you get what you expect when you run top.

@hzeller
Copy link
Member Author

hzeller commented Oct 21, 2024

Yes indeed, this was on my Laptop with an i7-1280P that has 'performance cores' and 'efficiency cores' (if this is what you mean with hybrid CPU).

And looks like changing the sample_index helps and I see the expected output from top.
How can this need to choose the sample index be more discoverable ? Or could it be automatic ?

@hzeller
Copy link
Member Author

hzeller commented Oct 22, 2024

Perhaps the default sample_index could be some aggregate, e.g. pseudocode sum(*/cycles:Pu/) as we're interested in what all cores were up to. That makes it easy to get the most expected and least surprising result by default, but then let's the user still dig into details looking at various sample_index contents.

@gmarin13
Copy link
Collaborator

How can this need to choose the sample index be more discoverable ? Or could it be automatic ?

When I opened your profile with the internal pprof, both CLI and web, it defaulted to sample_index 3 for me.

Perhaps the default sample_index could be some aggregate, e.g. pseudocode sum(*/cycles:Pu/) as we're interested in what all cores were up to.

This is not trivial in the general case. The perf.data file looks like it contains multiple events and it doesn't always make sense to aggregate them. For example, you could collect cycles and cache misses. You don't want to aggregate them, but you can use the sample_index to select the right metric. This is easier with the web interface where you have a drop-down selector to select from the available metrics.

Not saying it cannot be done, but we need to understand when the events are measuring the same event on different PMUs. The Intel implementation for hybrid CPUs uses different PMUs for each core type. On ARM you have a single PMU and one event with samples from all cores.

@hzeller
Copy link
Member Author

hzeller commented Oct 22, 2024

How can this need to choose the sample index be more discoverable ? Or could it be automatic ?

When I opened your profile with the internal pprof, both CLI and web, it defaulted to sample_index 3 for me.

ok, so maybe there is a piece of code in the internal version that did not make it to external perf_data_converter/pprof yet ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants