Discuss usage and support of eBPF #386

No9 · 2020-05-04T13:43:39Z

Hi All
I've been doing some work on running node in kubernetes environments and looking at aspects of observability.
An avenue of investigation has been around using eBPF to provide diagnostics that utilise the node USDT hooks available on Linux.
http://www.brendangregg.com/blog/2016-10-12/linux-bcc-nodejs-usdt.html

@mmarchini has also been doing a work in this area and I noticed this PR recently into unofficial nodejs/unofficial-builds#19
that would suggest there is activity here.

Yet @jasnell opened an issue for probe support 1 year ago with zero responses to maintaining DTrace/eWT capabilities nodejs/node#26571

I think it would be fair to say there is some ambiguity on the need, usage and support for probes in node.js.

Would the diagnostics WG be open to a discussion at the next meeting on any or all of the following items:

Get everyone on the same page with regard to OS level tracing, particularly eBPF
Discuss if it makes sense to head towards enabling USDT on Linux in official builds
Understanding how eBPF maps to user journeys that the diagnostics team are working through

Hope some find this is a useful topic to dive into and we can discuss further.

jasnell · 2020-05-04T13:44:55Z

I'm not sure about the whole diagnostic wg but I certainly am :)

gireeshpunathil · 2020-05-04T13:50:52Z

@No9 - thanks, I have added this to the agenda in the upcoming meeting.

mmarchini · 2020-05-04T17:19:59Z

Also related: nodejs/TSC#853

Qard · 2020-05-04T20:07:18Z

I'm also very interested. I have been meaning to dig into eBPF stuff to learn what we could do with it, but haven't got to it yet. The concept sounds very promising to me though so I'd love to learn more. :)

mmarchini · 2020-05-05T01:15:28Z

Overall BPF is a powerful tool which fulfills a different set of use cases compared to the builtin trace events API (one could say the meaning of "trace" is different from a BPF perspective and from the high-level trace events API), the most exiting being the ability to correlate events from the application with kernel events. I don't expect it to be used by a majority of Node.js users, just as with core dumps, it's a tool which will be a better fit in some situations for folks with some familiarity of the internals.

Tracing with BPF works by attaching small programs to probes (which is a generic term to any event which can trigger a BPF tracing program). The common probes used to trace Node.js (as well as other runtimes) are uprobes, usdt, and watchpoints.

uprobes attach to native functions (or to memory addresses) and they'll trigger when that function or memory address is executed (there's also the analogous uretprobe which is executed when a function returns, but I advise against using it because there's a bug on V8 which can cause the application to crash). uprobes are available today as long as the binary is not stripped (well, you could still attach to uprobes if the binary is stripped assuming you know the exact address of a function, which is unlikely).

usdt probes are tracepoints exposed by the application. They are usually strategically placed on relevant code paths for common (for example, HTTP path and GC). They can be understood as a "public API tracers can attach to", and on some runtimes they are considered a stable API (that's not the case on Node.js today, although I don't think we ever broke them in a semver manner). As @No9 mentioned, we don't expose those tracepoints in our default build today. Our implementation is also tightly coupled with dtrace-style files and systemtap (which might be a good thing or not). A while back I tried to unify how we define TRACE_EVENTS and usdt probes, but hit a wall because TRACE_EVENTS allow tracepoints to be defined during runtine, whereas USDT probes need to be defined during build time (sort of, see next paragraph).

Related to usdt we have "dynamic" usdt, which are tracepoints defined during runtime. The cool thing about "dynamic" usdt is that it allows creating tracepoints in JavaScript. The downside is that each dynamic tracepoint will result in a couple C++ call, which can increase overhead if the tracepoint is in a hot path. dtrace has a good, builtin support for dynamic usdt, but we never got something official on Linux. A few years ago I came up with a hack-y solution (https://github.com/sthima/libstapsdt), which works "well enough". I think some companies are using it, but I'm not entirely sure.

The last one is watchpoints, which attaches to memory events (read, write or execute a memory address). uprobes are limited to mmap-ed memory from files, whereas watchpoints are also available for heap memory, which means a well crafted watchpoint probe can trace a JavaScript function. It's a hassle though, you need to generate a Linux perf map file, get the address of the function, which needs to be compiled, and hope the GC or deoptimized won't move that function while you're tracing it.

I believe there's room for improvements here, and I'm happy to see this topic coming up again :D

Qard · 2020-05-05T01:49:02Z

Yeah, I read a bit about it. The ability to define tracepoints at runtime made me curious about the possible use for APM products. It'd be super neat if an APM vendor could toggle low-level tracepoints on and off as-needed. It'd be super valuable to be able to provide deeper insights when something seriously anomalous is detected. 🤔

mmarchini · 2020-08-05T16:52:05Z

Removing from normal agenda, we can discuss on deep dive

github-actions · 2020-11-04T00:42:36Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mhdawson mentioned this issue May 4, 2020

Node.js Diagnostics WorkGroup Meeting 2020-05-06 #385

Closed

gireeshpunathil added the diag-agenda label May 4, 2020

mhdawson mentioned this issue May 18, 2020

Node.js Diagnostics WorkGroup Meeting 2020-05-20 #390

Closed

mhdawson mentioned this issue Jun 1, 2020

Node.js Diagnostics WorkGroup Meeting 2020-06-03 #393

Closed

mhdawson mentioned this issue Jun 15, 2020

Node.js Diagnostics WorkGroup Meeting 2020-06-17 #397

Closed

mmarchini added the diag-deepdive-agenda Used for agenda items related to diagnostic deep dive sessions label Jun 17, 2020

This was referenced Jul 2, 2020

Node.js Diagnostics WorkGroup Meeting 2020-07-08 #405

Closed

Node.js Diagnostics WorkGroup Meeting 2020-07-08 #406

Closed

mhdawson mentioned this issue Jul 20, 2020

Node.js Diagnostics WorkGroup Meeting 2020-07-22 #412

Closed

This was referenced Jul 27, 2020

Node.js Diagnostics Deep Dive Meeting 2020-07-29 #419

Closed

Node.js Diagnostics WorkGroup Meeting 2020-08-05 #422

Closed

mmarchini removed the diag-agenda label Aug 5, 2020

mhdawson mentioned this issue Aug 10, 2020

Node.js Diagnostics Deep Dive Meeting 2020-08-12 #425

Closed

github-actions bot added the stale label Nov 4, 2020

github-actions bot closed this as completed Nov 18, 2020

RafaelGSS mentioned this issue Sep 7, 2022

Bring DTrace/eBPF support back nodejs/node#44550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss usage and support of eBPF #386

Discuss usage and support of eBPF #386

No9 commented May 4, 2020

jasnell commented May 4, 2020

gireeshpunathil commented May 4, 2020

mmarchini commented May 4, 2020

Qard commented May 4, 2020

mmarchini commented May 5, 2020

Qard commented May 5, 2020

mmarchini commented Aug 5, 2020

github-actions bot commented Nov 4, 2020

Discuss usage and support of eBPF #386

Discuss usage and support of eBPF #386

Comments

No9 commented May 4, 2020

jasnell commented May 4, 2020

gireeshpunathil commented May 4, 2020

mmarchini commented May 4, 2020

Qard commented May 4, 2020

mmarchini commented May 5, 2020

Qard commented May 5, 2020

mmarchini commented Aug 5, 2020

github-actions bot commented Nov 4, 2020