Table of Contents generated with DocToc

General Strategies to track and improve Performance
v8 Performance Profiling

General Strategies to track and improve Performance

Identify and Understand Performance Problem

watch | slide watch profiling workflow

Analyse performance only once you have a problem in a top down manner like so:

ensure it's JavaScript and not the DOM
reduce testcase to pure JavaScript and run in v8 shell
collect metrics and locate bottlenecks
sample profiling to narrow down the general problem area
- at this point think about the algorithm, data structures, techniques, etc. used in this area and evaluate if improvements in this area are possible since that will most likely yield greater impact than any of the more fine grained improvments
structural profiling to isolate the exact area i.e. function in which most time is spent
- evaluate what can be improved here again thinking about algorithm first
- only once algorithm and data structures seem optimal evaluate how the code structure affects assembly code generated by v8 and possible optimizations (small functions, try/catch, closures, loops vs. forEach, etc.)
optimize slowest section of code and repeat structural profiling

Sampling CPU Profilers

watch watch walkthrough

at fixed frequency program is instantaneously paused by setting stacksize to 0 and the call stack sampled
assumes that the sample is representative of workload
gives no sense fo flow to due gaps between samples
functions that were inlined by compiler aren't shown
collect data for longer period of time, sampling every 1ms
ensure code is exercising the right code paths

Structural CPU Profilers

watch watch walkthrough

functions are instrumented to record entry and exit times
three data points per function
- Inclusive Time: time spent in function including its children
- Exclusive Time: time spent in function excluding its children
- Call Count: number of times the functino was called
data points are taken at much higher frequency than sampling
higher cost than sampling dut to instrumentation
goal of optimization is to minimize inclusive time
inlined functions retain markers

Instrumentation Techniques

watch

think about data being processed
- is one piece of data slower?
name time ranges based on data
- use variables/properties to dynamically name ranges

Instrumenting vs. Sampling

watch

+--------------------------------------------------------------------------------------------+
|                                   |      Sampling          |    Structural / Instrumenting |
|-----------------------------------+------------------------+-------------------------------|
| Time                              |       Approximate      |            Exact              |
| Invocation count                  |       Approximate      |            Exact              |
| Overhead                          |       Small            |            High(er)           |
| Accuracy                          |       Good - Poor      |            Good - Poor        |
| Extra code / instrumentation      |       No               |            Yes                |
+--------------------------------------------------------------------------------------------+

need both
manual instrumentation can reduce overhead
instrumentation affects performance and may affect behavior
samples are very accurate, but inaccurate for extacting time
samping requires no program modification

Plan for Performance

watch

each module of app sould have time budget
sum of modules should be < 16ms for smooth client side apps
track performance daily or per commit in order to catch budget busters right away

Animation Frame

watch watch walkthrough

queue up key handlers and execute inside Animation Frame
optimize for lowest common denominator that your app will run on
for mobile stay below 8-10ms since remaining time is needed for chrome to do its work, i.e. render

v8 Performance Profiling

Chrome Devtools Profiler

watch

Profile Tab -> Start -> Record Sample
tree view gives idea of flow (call stack) and allows drilling into tree nodes
save profiles to load them later i.e. for bug reports
use octane benchmark to experiment with the profiler

Chrome Tracing aka chrome://tracing

watch

access at chrome://tracing
hidden feature like chrome://memory originally designed by chrome developers for chrome developers
view into guts of what chrome is doing
timeline of what code is doing framed in larger chrome context
allows optimizing low level gpu performance

Preparation

watch

instrument code
- a) manually add calls to console.time and console.timeEnd with a unique name as argument to mark entry and exit points of an area in the code
- b) Firefox does automatic instrumentation via Firebug (Chrome's Profiler is sample based, while Firebug's is structural)
- c) use compiler/automatic tool to add calls
- d) use runtime instrumentation, similar to valgrind in C
instrumentation archieved via trace macros
- can be nested (hierarchy reflected in profiling display)
- when turned off cost at most a few dozen clocks
- when turned on cost a few thousand clocks (0.01ms)
- arguments passed to macro are only computed when macro is enabled
time/timeEnd spam dev tools console (keep it closed)
in order to easily remove macro in production wrap time/timeEnd calls

Running

watch

close all other tabs in order to have the least noise caused by other tabs and thus get cleaner samples
|Record| to start recording a trace
switch to app and interact with it, limit this to 10s as buffer gets large very quickly
switch back |Stop Tracing|
|Save| / |Load| trace

Evaluation

watch

data includes lots of noise since each tab/process will include activity from the following pieces:
- IO thread
- renderer thread
- compositor thread
find pid of your page via chrome://memory

Filter for Signal

watch

in order to get nice timeline
remove unnec. threads and components by selecting only rows with your pid
filter by categories, v8 and webkit are most relevant for JS profiling

Inspect

watch

navigation based on quake keys and is not mouse friendly, although it seems to be improving

                  +---+
                  | W | zoom in
+---+             +---+           +---+               +---+
| A | pan left    | S | zoom out  | D | pan right     | ? |  help (other shortcuts)
+---+             +---+           +---+               +---+

Resources

trace-viewer supports streaming trace data over web sockets
trace event format JSON format to allow interfacing with other tools
web tracing framework an alternative to the built in tracer
about:tracing

v8 tools

ship with v8 source code
plot-time-events: generates png showing v8 timeline
(mac|linux|windows)-tick-processor: generates table of functions sorted by time spent in them

Using Chrome

v8 timeline

Capturing

watch

Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"

[ .. ]

tools/plot-timer-events /chrome/dir/v8.log

Analyzing

watch

Top Band

v8.GCScavenger young generation collection
v8.Execute executing JavaScript
scavenges interrupt script execution

Middle Band

shows code kind
bright green - optimized
blue/purple - unoptimized

Bottom Graph

shows pauses
lots in beginning since scripts are being parsed
no pauses when running optimized code
scavenges (top band) correllate with pause time spikes

Finding Slow Running Unoptimized Functions

watch

Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"

[ .. ]

tools/mac-timer-events /chrome/dir/v8.log

watch | slide

generates table of functions sorted by time spent in them
includes C++ functions
* indicates optimized functions
functions without * could not be optimized

d8

watch | slide

/v8/out/native/d8 test.js --prof

Determining why a Function was not Optimized

watch watch | slide

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --no-sandbox --js-flags="--trace-deopt --trace-opt-verbose --trace-bailout"

[ . lots of other output. ]

[disabled optimization for xxx, reason: The Reason why function couldn't be optimized]

lots of output which is best piped into file and evaluated
especially watch out for deoptimized functions with lots of arithmetic operations

d8

watch | slide

d8 --trace-opt

Log optimizing compiler bailouts:

watch | slide

d8 --trace-bailout

Log deoptimizations:

watch | slide

d8 --trace-deopt

Improvments

don't use construct that caused function to be deoptimized
or move all code inside construct into separate function and call it instead

Resources

video: accelerating oz with v8 | slides
video: structural and sampling profiling in google chrome | slides
v8 profiler
stackoverflow: how to debug nodejs applications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance-profiling.md

performance-profiling.md

General Strategies to track and improve Performance

Identify and Understand Performance Problem

Sampling CPU Profilers

Structural CPU Profilers

Instrumentation Techniques

Instrumenting vs. Sampling

Plan for Performance

Animation Frame

v8 Performance Profiling

Chrome Devtools Profiler

Chrome Tracing aka chrome://tracing

Preparation

Running

Evaluation

Filter for Signal

Inspect

Resources

v8 tools

Using Chrome

v8 timeline

Capturing

Analyzing

Top Band

Middle Band

Bottom Graph

Finding Slow Running Unoptimized Functions

d8

Determining why a Function was not Optimized

d8

Improvments

Resources

Files

performance-profiling.md

Latest commit

History

performance-profiling.md

File metadata and controls

General Strategies to track and improve Performance

Identify and Understand Performance Problem

Sampling CPU Profilers

Structural CPU Profilers

Instrumentation Techniques

Instrumenting vs. Sampling

Plan for Performance

Animation Frame

v8 Performance Profiling

Chrome Devtools Profiler

Chrome Tracing aka chrome://tracing

Preparation

Running

Evaluation

Filter for Signal

Inspect

Resources

v8 tools

Using Chrome

v8 timeline

Capturing

Analyzing

Top Band

Middle Band

Bottom Graph

Finding Slow Running Unoptimized Functions

d8

Determining why a Function was not Optimized

d8

Improvments

Resources