Use llvm-symbolizer's JSON output for symbolizing #879

LeszekSwirski · 2024-07-15T08:28:01Z

In some edge cases (e.g. injected JIT symbols), function names can have
new lines. This breaks the llvm-symbolizer output parsing, and makes
pprof hang.

Conveniently, as of LLVM 13, llvm-symbolizer has a JSON output mode,
which is robust against all kinds of weirdness like new lines. We can
use this instead of the line-based parsing, and as a bonus we get much
simpler handling of multiple frames in a stack, as the JSON output
already returns these as an array.

This also requires splitting the CODE and DATA processing into separate
functions, since their JSON output is incompatible. For now, we keep the
DATA output as before, a slightly hacky but functional concatenation of
start + size, but this could be improved.

In some edge cases (e.g. injected JIT symbols), function names can have new lines. This breaks the llvm-symbolizer output parsing, and makes pprof hang. Conveniently, as of LLVM 13, llvm-symbolizer has a JSON output mode, which is robust against all kinds of weirdness like new lines. We can use this instead of the line-based parsing, and as a bonus we get much simpler handling of multiple frames in a stack, as the JSON output already returns these as an array. This also requires splitting the CODE and DATA processing into separate functions, since their JSON output is incompatible. For now, we keep the DATA output as before, a slightly hacky but functional concatenation of start + size, but this could be improved.

codecov-commenter · 2024-07-26T22:11:05Z

Codecov Report

Attention: Patch coverage is 77.08333% with 11 lines in your changes missing coverage. Please review.

Project coverage is 67.00%. Comparing base (0ed6a68) to head (9c35335).
Report is 31 commits behind head on main.

Files	Patch %	Lines
internal/binutils/addr2liner_llvm.go	77.08%	8 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #879      +/-   ##
==========================================
+ Coverage   66.86%   67.00%   +0.13%     
==========================================
  Files          44       44              
  Lines        9824     9798      -26     
==========================================
- Hits         6569     6565       -4     
+ Misses       2794     2780      -14     
+ Partials      461      453       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

insilications · 2024-08-11T03:41:36Z

With this merged, I think #823 can be easily solved? The same JSON frame data from llvm-symbolizer that is being read can provide StartLine for Function.start_line.

If this change is welcome, I can create a PR.

LeszekSwirski force-pushed the llvm-symbolizer-json branch from 3ca09dc to 6e57ca2 Compare July 15, 2024 08:29

aalexand approved these changes Jul 26, 2024

View reviewed changes

Merge branch 'main' into llvm-symbolizer-json

9c35335

aalexand merged commit 813a5fb into google:main Jul 27, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use llvm-symbolizer's JSON output for symbolizing #879

Use llvm-symbolizer's JSON output for symbolizing #879

LeszekSwirski commented Jul 15, 2024

codecov-commenter commented Jul 26, 2024

insilications commented Aug 11, 2024 •

edited

Loading

Use llvm-symbolizer's JSON output for symbolizing #879

Use llvm-symbolizer's JSON output for symbolizing #879

Conversation

LeszekSwirski commented Jul 15, 2024

codecov-commenter commented Jul 26, 2024

Codecov Report

insilications commented Aug 11, 2024 • edited Loading

insilications commented Aug 11, 2024 •

edited

Loading