[USDT] Unified Tracing API #10

Qard · 2015-03-20T23:40:58Z

I'm moving the discussion of unifying the native tracing systems to one API from nodejs/node#729 to here. @thekemkid and I are interested in seeing this go somewhere.

There seems to be rough consensus on a macro-based solution something like:

REPORT(int subsystem, int eventType, args...) to report events in, when enabled
INSTRUMENT(callback(int subsystem, int eventType, args...)) to listen for these events and process them in some way.

What remains unknown is:

Is there any prototyping effort underway?
What event types are people interested in?
What data do people hope to get in args?

The text was updated successfully, but these errors were encountered:

trevnorris · 2015-03-23T18:11:15Z

Does this have to reported immediately, or can the results be reported asynchronously?

Qard · 2015-03-23T20:21:01Z

I think it should probably get to the INSTRUMENT(...) callback immediately, but what happens from there depends on what the module wants to do with that data.

Why? Do you have some particular ideas/concerns in mind?

trevnorris · 2015-03-23T22:53:00Z

@Qard Simply that the impl going into chromium stores data in a ring buffer and is read out from another thread periodically. so you can get statistics, but it's not immediate. I think we should properly define if the user should expect to receive results at the moment of execution, or at some point in the future.

BTW: the later is much better for performance.

sam-github · 2015-03-24T16:44:57Z

On Mon, Mar 23, 2015 at 3:53 PM, Trevor Norris notifications@github.com
wrote:

@Qard https://github.com/Qard Simply that the impl going into chromium
stores data in a ring buffer and is read out from another thread
periodically. so you can get statistics, but it's not immediate. I think we
should properly define if the user should expect to receive results at the
moment of execution, or at some point in the future.

The chrome devs were pretty definite in their feedback about this: they'd
tried immediate availability, and found it worked badly.

Qard · 2015-03-24T21:12:23Z

For my uses, as long as the data has reasonably accurate timestamps, async should be fine. I'm not too familiar with dtrace/etw/lttng to know if those require sync events.

It might make sense for INSTRUMENT(...) to instead take a class instance that receives sync events and does some minor operations on them, but then defers the actual reporting part to a worker thread.

trevnorris · 2015-03-24T21:17:19Z

@Qard The timestamps are all recorded at the time they're recorded. It's that the data isn't aggregated/processed until some time in the future.

billti · 2015-09-01T23:56:51Z

Is there any progress happening on this effort? Some good tracing would be really helpful both to debugging Node itself, as well as potential for instrumenting user-side code.

I'm working on the Node Tools for Visual Studio and just hacked in some tracing to help troubleshoot a hairy debugger/networking race issue, and it was invaluable. The current tracing is quite limited and non-trivial to understand. I'm a big believer in good diagnostics though so would be happy to help make this something well supported in the product.

I'm quite familiar with ETW on Windows, but not so much on DTrace or the Linux options. I'd be happy to help out on a consistent cross-platform design and any Windows-specific implementation challenges.

I've seen a couple different threads on different repos for this. Let me know where this is active and if I can help. Thanks!

Qard · 2015-09-02T00:05:01Z

Unfortunately, no. Not a lot has happened on this. I would absolutely appreciate help getting this moving though. This is still a thing I care deeply about, but I only have so much time and ability to work on this alone. Perhaps we need to have another meeting to introduce the new folks such as yourself to the group and to facilitate some forward movement on these ideas?

pmuellr · 2015-09-02T02:43:47Z

I'm a newcomer here, lurking because I'm interested. Would be interested in another meeting.

Bill, wonder if you would be willing to do a little write-up on your hacked-in tracing for node/win. Assuming you use ETW, I'm curious to see what you did, how you consumed the trace data, what the trace providers/data/etc were, ...

trevnorris · 2015-09-02T04:22:05Z

IIRC @ofrobots said we could pull in the tracing API for experimentation. Don't remember the specifics though of how to start that process.

billti · 2015-09-02T16:06:25Z

@pmuellr Were you interested in just the implementation? Or an actual long form write-up of how to use ETW (perhaps specifically in the context of Node)?

For my diagnostics/experimentation, 2 commits may be of interest.

billti/node@3823c75 adds a way to call the generic EventWriteString method, exposed internally as DTrace_Msg. This wouldn't be how you'd add events for Node itself, but I could imagine exposing something like this for apps to use (e.g. process.trace or some such, wired up as the existing trace event calls are in InitDTrace). The events emitted will have the same ID, but the severity level and keyword (a bitmask) can be set when logging the event, and filtered on in the tooling.
billti/node@69d5522 adds a specific event to the manifest (logging when the debug agent is started and on what port), along with a description, data structure, keyword, level, etc... This is how you'd typically add "well known" events to Node itself. By adding them to the manifest you can efficiently pack data for events in binary form, and anyone can query on what events, keywords, etc... are available, and tools can query and format the log in a rich manner. (See Writing a manifest if interested in what is available here).

From what little I know about DTrace etc..., I believe similar capabilities exist for specifying levels, keywords/categories, etc... Within Node, the platform differences are abstracted away via header files or macros conditionally included at the top of node_dtrace.cc.

For my specific scenario (race condition between launching node from VS and attaching to the debug port) I just quickly used the generic tracing above to log the debug agent activities, added similar tracing on the Node Tools for VS side, and then captured a trace from both these providers plus the network driver using PerfView. As timing to the sub-millisecond may have been involved, the equivalent of console.log/printf across processes isn't granular or accurate enough to correlate. With this combined trace I could see exactly when the agent & libuv loop were started, when sockets were being opened/bound to, the ACK/RST packets at the TCP level, the debugger payload going back and forth over the connection, etc...

Sorry for running a little long on this thread. I've conflated both potential questions above (the implementation details in Node, and the tools used for analysis). Both are tricky to understand, but super powerful when used properly, and probably worth a blog post on their own.

pmuellr · 2015-09-02T16:38:29Z

@billti thanks for the info - no problem going long - nice to see a practical example

ofrobots · 2015-09-02T19:36:08Z

I posted some details about the trace-event API used by blink on this issue: #15

joshgav · 2017-02-06T18:28:55Z

related: #83

gireeshpunathil · 2019-10-25T06:48:24Z

should this issue remain open? [ I am trying to chase dormant issues to closure ] @Qard

github-actions · 2020-07-19T00:39:06Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Qard · 2020-07-19T17:10:16Z

I think it's still relevant, though the code has changed a bunch since the discussion. Might be worth a bit of house cleaning discussion at the next diag meeting to identify which issues are still relevant and if there's any interest in working on them to justify leaving the issues open. 🤔

mmarchini · 2020-07-21T00:33:56Z

I'm triaging all stale issues today, removing the label from those I think are still relevant, closing those I'm mostly sure aren't, and leaving the label on those which I'm not sure.

This one IMO has been incorporated in the Unified Tracing TSC Strategic Initiative, so it can be closed.

pmuellr mentioned this issue Sep 23, 2015

Tracing WG Meeting 2015-09-25 #21

Closed

AndreasMadsen mentioned this issue Oct 4, 2016

Outreachy Node.js Core project ideas for mentees nodejs/CTC#17

Closed

AndreasMadsen changed the title ~~Unified Tracing API~~ [USDT] Unified Tracing API Sep 22, 2017

github-actions bot added the stale label Jul 19, 2020

mmarchini mentioned this issue Jul 21, 2020

Supporting multiple trace providers: Trace Events, DTrace, Lttng, ETW #163

Closed

github-actions bot closed this as completed Aug 4, 2020

kvakil mentioned this issue Jul 14, 2022

eBPF event loop blockage finder #569

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[USDT] Unified Tracing API #10

[USDT] Unified Tracing API #10

Qard commented Mar 20, 2015

trevnorris commented Mar 23, 2015

Qard commented Mar 23, 2015

trevnorris commented Mar 23, 2015

sam-github commented Mar 24, 2015

Qard commented Mar 24, 2015

trevnorris commented Mar 24, 2015

billti commented Sep 1, 2015

Qard commented Sep 2, 2015

pmuellr commented Sep 2, 2015

trevnorris commented Sep 2, 2015

billti commented Sep 2, 2015

pmuellr commented Sep 2, 2015

ofrobots commented Sep 2, 2015

joshgav commented Feb 6, 2017

gireeshpunathil commented Oct 25, 2019

github-actions bot commented Jul 19, 2020

Qard commented Jul 19, 2020

mmarchini commented Jul 21, 2020

[USDT] Unified Tracing API #10

[USDT] Unified Tracing API #10

Comments

Qard commented Mar 20, 2015

trevnorris commented Mar 23, 2015

Qard commented Mar 23, 2015

trevnorris commented Mar 23, 2015

sam-github commented Mar 24, 2015

Qard commented Mar 24, 2015

trevnorris commented Mar 24, 2015

billti commented Sep 1, 2015

Qard commented Sep 2, 2015

pmuellr commented Sep 2, 2015

trevnorris commented Sep 2, 2015

billti commented Sep 2, 2015

pmuellr commented Sep 2, 2015

ofrobots commented Sep 2, 2015

joshgav commented Feb 6, 2017

gireeshpunathil commented Oct 25, 2019

github-actions bot commented Jul 19, 2020

Qard commented Jul 19, 2020

mmarchini commented Jul 21, 2020