Skip to content

Profiling Linux Builds with Perf

Bruno-DaSilva edited this page Jul 8, 2023 · 6 revisions

For profiling running instances spring-dedicated and spring-headless

If you want to profile spring-dedicated, while it is running, there is one good way of doing so, using linux perf tools, which can be installed on most ubuntu/debian based systems easily.

Preparing for a profile capture

Find out the PID of the spring-dedicated process you wish to attach to. The process seems to have two threads, with approximately equal shares of CPU usage. Find it with

ps aux | grep spring

Remember the process ID of the main thread (generally the smaller one)

Get the debug symbols

  • Create a working directory for where you will store your captures.
  • Download the debug symbols from the releases page here on github (take care to select linux, and the correct version too).
  • Unpack the debug symbols, and make sure the symbols (e.g. spring-dedicated.dbg) are unzipped into your working directory.

Starting the profiler

Often you will need sudo permissions to perform this capture. perf will warn you otherwise.

Example command:

sudo perf record --pid=3140530 --freq=1000 -m 10M -o spring-dedicated_6cg_lag1.perfoutput --stat -g --call-graph dwarf -e cpu-cycles

Arguments:

  • --pid= the process id for spring-dedicated, can pass multiple pids comma separated list. This will sample every thread of the given process.
  • --tid= the thread id of an individual thread you wish to sample (rather than the whole process). Typically the main thread has the same tid as the pid of the process.
  • --freq= the number of samples per second to take. For higher frequencies or on slower machines, you can set the -m flag to increase perf's memory.
  • -m (# | #M) - number of mmap data pages OR size specification with appended unit character. This memory acts as a ring buffer of perf samples the kernel writes, read from by userspace to be written to disk. Larger values can allow high resolution sampling for longer periods before dropping samples.
  • -o the name of the output file, if you dont name it, it will overwrite previous captures
  • -g enable call graphs (this is implied by below)
  • --stat Record per-thread event counts. Use it with perf report -T to
  • -e eventtype1,eventtype2 - a comma separated list of events to record. A list of supported events on your system can be found with perf list. Typically, we recommend hardware events like cpu-cycles, cache-misses, and branch-misses. Be aware, the more events you specify, the more data you'll save on disk.
  • --call-graph dwarf : the format of the call graphs. I am unsure if 'fp' or 'dwarf' is the correct argument here, but it seems like dwarf is better.

               When "dwarf" recording is used, perf also records (user) stack dump
               when sampled.  Default size of the stack dump is 8192 (bytes).
               User can change the size by passing the size after comma like
               "--call-graph dwarf,4096".
               When "fp" recording is used, perf tries to save stack enties
               up to the number specified in sysctl.kernel.perf_event_max_stack
               by default.  User can change the number by passing it after comma
               like "--call-graph fp,32".

Stop the profiler:

Press CTRL+C to stop the profiler, or send it a SIGTERM signal from code, if you are doing this automatically

Displaying the results in hostpot

  1. build or install hotspot following steps here: https://github.com/KDAB/hotspot#getting-hotspot
  2. run hotspot via $ hotspot
  3. open the perf data file you saved from perf record
  4. Enjoy! Example ui: image

Displaying the results in a browsable format:

sudo perf report -i spring-dedicated_6cg_lag1.perfoutput

Use the '+' key on your keyboard to expand call graphs within perf

Drawing a full-on call graph:

sudo perf report -T -i spring-dedicated_6cg_lag1.perfoutput