Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss usage and support of eBPF #386

Closed
No9 opened this issue May 4, 2020 · 8 comments
Closed

Discuss usage and support of eBPF #386

No9 opened this issue May 4, 2020 · 8 comments
Labels
diag-deepdive-agenda Used for agenda items related to diagnostic deep dive sessions stale

Comments

@No9
Copy link
Member

No9 commented May 4, 2020

Hi All
I've been doing some work on running node in kubernetes environments and looking at aspects of observability.
An avenue of investigation has been around using eBPF to provide diagnostics that utilise the node USDT hooks available on Linux.
http://www.brendangregg.com/blog/2016-10-12/linux-bcc-nodejs-usdt.html

@mmarchini has also been doing a work in this area and I noticed this PR recently into unofficial nodejs/unofficial-builds#19
that would suggest there is activity here.

Yet @jasnell opened an issue for probe support 1 year ago with zero responses to maintaining DTrace/eWT capabilities nodejs/node#26571

I think it would be fair to say there is some ambiguity on the need, usage and support for probes in node.js.

Would the diagnostics WG be open to a discussion at the next meeting on any or all of the following items:

  • Get everyone on the same page with regard to OS level tracing, particularly eBPF
  • Discuss if it makes sense to head towards enabling USDT on Linux in official builds
  • Understanding how eBPF maps to user journeys that the diagnostics team are working through

Hope some find this is a useful topic to dive into and we can discuss further.

@jasnell
Copy link
Member

jasnell commented May 4, 2020

I'm not sure about the whole diagnostic wg but I certainly am :)

@gireeshpunathil
Copy link
Member

@No9 - thanks, I have added this to the agenda in the upcoming meeting.

@mmarchini
Copy link
Contributor

Also related: nodejs/TSC#853

@Qard
Copy link
Member

Qard commented May 4, 2020

I'm also very interested. I have been meaning to dig into eBPF stuff to learn what we could do with it, but haven't got to it yet. The concept sounds very promising to me though so I'd love to learn more. :)

@mmarchini
Copy link
Contributor

Overall BPF is a powerful tool which fulfills a different set of use cases compared to the builtin trace events API (one could say the meaning of "trace" is different from a BPF perspective and from the high-level trace events API), the most exiting being the ability to correlate events from the application with kernel events. I don't expect it to be used by a majority of Node.js users, just as with core dumps, it's a tool which will be a better fit in some situations for folks with some familiarity of the internals.

Tracing with BPF works by attaching small programs to probes (which is a generic term to any event which can trigger a BPF tracing program). The common probes used to trace Node.js (as well as other runtimes) are uprobes, usdt, and watchpoints.

uprobes attach to native functions (or to memory addresses) and they'll trigger when that function or memory address is executed (there's also the analogous uretprobe which is executed when a function returns, but I advise against using it because there's a bug on V8 which can cause the application to crash). uprobes are available today as long as the binary is not stripped (well, you could still attach to uprobes if the binary is stripped assuming you know the exact address of a function, which is unlikely).

usdt probes are tracepoints exposed by the application. They are usually strategically placed on relevant code paths for common (for example, HTTP path and GC). They can be understood as a "public API tracers can attach to", and on some runtimes they are considered a stable API (that's not the case on Node.js today, although I don't think we ever broke them in a semver manner). As @No9 mentioned, we don't expose those tracepoints in our default build today. Our implementation is also tightly coupled with dtrace-style files and systemtap (which might be a good thing or not). A while back I tried to unify how we define TRACE_EVENTS and usdt probes, but hit a wall because TRACE_EVENTS allow tracepoints to be defined during runtine, whereas USDT probes need to be defined during build time (sort of, see next paragraph).

Related to usdt we have "dynamic" usdt, which are tracepoints defined during runtime. The cool thing about "dynamic" usdt is that it allows creating tracepoints in JavaScript. The downside is that each dynamic tracepoint will result in a couple C++ call, which can increase overhead if the tracepoint is in a hot path. dtrace has a good, builtin support for dynamic usdt, but we never got something official on Linux. A few years ago I came up with a hack-y solution (https://github.com/sthima/libstapsdt), which works "well enough". I think some companies are using it, but I'm not entirely sure.

The last one is watchpoints, which attaches to memory events (read, write or execute a memory address). uprobes are limited to mmap-ed memory from files, whereas watchpoints are also available for heap memory, which means a well crafted watchpoint probe can trace a JavaScript function. It's a hassle though, you need to generate a Linux perf map file, get the address of the function, which needs to be compiled, and hope the GC or deoptimized won't move that function while you're tracing it.

I believe there's room for improvements here, and I'm happy to see this topic coming up again :D

@Qard
Copy link
Member

Qard commented May 5, 2020

Yeah, I read a bit about it. The ability to define tracepoints at runtime made me curious about the possible use for APM products. It'd be super neat if an APM vendor could toggle low-level tracepoints on and off as-needed. It'd be super valuable to be able to provide deeper insights when something seriously anomalous is detected. 🤔

@mmarchini
Copy link
Contributor

Removing from normal agenda, we can discuss on deep dive

@github-actions
Copy link

github-actions bot commented Nov 4, 2020

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diag-deepdive-agenda Used for agenda items related to diagnostic deep dive sessions stale
Projects
None yet
Development

No branches or pull requests

5 participants