-
Notifications
You must be signed in to change notification settings - Fork 7.3k
WIP: add new continuation-local-storage module #5990
Conversation
A simple, annotated example: | ||
|
||
```javascript | ||
var cls = require('contination-local-storage'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuation-local-storage is a horrible name for a module, it's so very easy to make a typo...
(I'm only half tongue-in-cheek, it's way too long for a core module. child_process is already 5 characters or so longer than I like.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a typo besides. Module names need to be snake_case and not hyphenated, or else gyp
explodes horribly. Will fix.
I agree that the name is a lot to type, but the single largest barrier to the adoption of domains has been that nobody seems to get what they are or why they might want to use them from the name, and once they do understand them, the first thing they say is that the name is terrible. "Continuation-local storage" is a lot to type, but is at least somewhat explicit and descriptive.
This is a total bikeshedding pitfall, but I'm happy to change the name if core can come to a consensus about what that new name might be. I suggested cls
, and indeed that's what I name the variable I import the module into almost everywhere, but as @isaacs pointed out, that's our old buddy CLear Screen from Windows. The original stab I took at a name was local_context
, which is an even more vague and abstract name than domains
, and also clashes horribly with what "local" and "context" both mean inside V8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've even read the following piece of doc/api/continuation_local_storage.md
:
Continuation-local storage provides a mechanism similar to thread-local storage in threaded programming, with closures wrapped around portions of a continuation chain taking the place of mutable cells bound to thread data structures.
I still don't seem to get what it is. Seriously, “portions of a continuation chain taking the place of mutable cells” sounds like “religious philosophy meets biotech” to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll see if I can cut the technobabble. This is a pretty abstract thing, though, and it's tough to explain concisely. Perhaps a better example?
On Aug 5, 2013, at 9:29, Mithgol notifications@github.com wrote:
In doc/api/continuation_local_storage.md:
+Continuation-local storage provides a mechanism similar to thread-local storage
+in threaded programming, with closures wrapped around portions of a
+continuation chain taking the place of mutable cells bound to thread data
+structures. Contexts are created on namespaces and can be be nested.
+
+Every namespace is created with a default context. The currently active
+context on a namespace is available vianamespace.active
.
+
+A simple rule of thumb is anywhere where you might have set a property on the
+request
orresponse
objects in an HTTP handler, you can (and should) now
+use continuation-local storage.
+
+A simple, annotated example:
+
+```javascript
+var cls = require('contination-local-storage');
I've read the following:Continuation-local storage provides a mechanism similar to thread-local storage in threaded programming, with closures wrapped around portions of a continuation chain taking the place of mutable cells bound to thread data structures.
I still don't seem to get what it is. Seriously, �gportions of a continuation chain taking the place of mutable cells�h seems like �greligious philosophy meets biotech�h to me.
�
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a few simpler concepts?
It's a storage. All right. Everyone knows what a storage is. That's a place where you can store data. And read it later, if and when it's still available.
It's a local storage. All right. That means that the storage is not universally (globally) available. It is bound to something local.
Now what is that something?
Local storage like thread-local in threaded programming except there's no threaded programming. (“A samurai without a sword is like a samurai with a sword, but only without a sword”.)
Сlosures are wrapped around portions of a continuation chain taking the place of mutable cells bound to thread data structures. (Really needs explaining. What is that “continuation chain”, what are “mutable cells”, what are “thread data structures” in Node's JavaScript? Could it be that this analogy is meant to be understood only by people with some experience in threaded programming?)
Don't know how I feel about the functionality this module provides but I commented on some of the technical aspects. |
Thanks for the comments! I'll clean up the typos where appropriate and add more test cases to make some of the design decisions clearer. For me, at least, this module fills a significant hole. New Relic agents have a "transaction tracer" that helps developers drill down into their code and figure out which third-party tools (or components of frameworks) they're using are consuming the most time during slow transactions. We need some mechanism for tracing individual requests, and the problem with doing this entirely in userspace is that you have to do a lot of monkeypatching, which still won't catch any third-party modules using |
Copying this from the comment that got hidden by a typo fix, because the name is definitely most people's biggest beef with the module so far:
|
I haven't been able to review this from my computer yet, but I'm not @bnoodhuis was telling me he's received complaints because once the domain I'm investigating a solution to the above issue, but trying to navigate I'll spend some time later taking a deeper look at this PR, and please help |
Thanks @trevnorris and @bnoordhuis for doing some preliminary review of this. Just to add some context to this: @othiym23 works for New Relic, on their Node tracing client. He's previously investigated using domains to do some of this, but they're a poor fit for tracing through a continuation, since they permanently alter the error handling semantics. (Ie, you can't have a domain that doesn't catch errors, and they shadow one another, so if the user is using domains in their app, it blocks the trace from working.) Having some way to track information throughout a request, local to the flow of the callbacks, is a pretty common request. If you own the entire app, you CAN do this using domains, but it a) is kind of clumsy, and b) doesn't allow a nice separation for modules that want to trace the flow and do other interesting things with that information. (For example, you could add some tracing and logging to be able to answer questions like, "How many of my incoming requests require a database lookup" rather easily.) In discussing this with him, I explicitly asked that he send a pull req with something as soon as possible, even if it's nowhere near finished, so that we can help guide him in the right direction. (Hence the "WIP" in the title.) If we can get something in by 0.12 that we can all live with, then great. If not, maybe by 1.0. No promises there, and I'm not willing to hold up the 0.12 release for this feature. They're also working on a hideously ugly but functional polyfill for the New Relic client (which will be necessary for them to support 0.8 and above anyway), so the specific version we land something in is not super important. But, I agree with @othiym23 that, at least in principle, this is something we ought to have in core, because as bad as the deopts will be, they'll probably be much much worse if they're done in userland – not to mention absurdly hacky, and touching tons of internal moving parts. If we can figure out what the eventual core API would look like, then New Relic can aim for that API in their polyfill. |
@trevnorris As a preface, it is fully my goal to end up with one mechanism that touches core that includes both continuation-local storage and domains, and for that mechanism to incur no overhead cost to developers who aren't using either domains or CLS. The current WIP code is there to allow me to stage the work (and also to get more eyes on the API I'm proposing). I'm using the current code for domains as the model for how to bring CLS into core, which is part of the reason for reworking domains to run on top of it – the rest is that they are conceptually very similar, with domains being focused on error-handling. The background information is contained in #3733 and #5243, and I'll try to concretely and explicitly describe my use case here:
I have tried as much as possible to keep the API surface small and general (and @creationix actually simplified the API considerably – thanks, Tim!), and my goal is to have this be as unobtrusive a mechanism as possible. Because it deals with multiple namespaces instead of just the single one used by domains, it unavoidably incurs a slightly larger amount of overhead than domains do in 0.10, but as long as it's not affecting performance when it's not in use, that seems like it shouldn't be too objectionable. I do think it's useful for more than just my use case. Long-term, it allows us to lock down more of the core objects in Node, like HTTPRequest and HTTPResponse (and process!), because users have an alternative storage mechanism they can use to communicate across call-stack boundaries. The use cases that other developers have described to me involve their own needs for a lightweight tracing or logging system in their code. |
@isaacs I almost feel like there's some better type of hook system that we |
@trevnorris it's entirely possible, but do keep in mind that asynchronous error-handling does bring a lot of special cases with it. Async error-handling is probably something better handled at the language level (and sort of is potentially handled by using the "real" shallow continuations passed around by generators via If you can come up with an efficient mechanism to do this at a low level, I'm more than happy to take a stab at rewriting domains to use it. I care about the result, not the implementation details. |
@othiym23 i'm sure there is. but you mind explaining what you're looking for w/o all the CS degree terminology. :) |
@trevnorris Sure. Think about what you're trying to do with a long stacktrace module – you want to know what happened in the past that led to you having to deal with an error right now. Most long stacktrace modules generate a new stacktrace at each turning of the event loop so that in case of an error, they can stitch them all together and present them to a human for deciphering. At the same time, the mechanism needs to trace specific chains of function calls when Node is handling many requests simultaneously. What you want is a mechanism that accumulates that information about how execution got to this point. In my case, I'm saying, "I want to pass along metadata about web requests, trace segments, and function calls across this set of function calls, regardless of any intervening asynchronous execution boundaries." CLS solves this problem by wrapping callbacks on the asynchronous call chain up in closures that carry along references to a context, which is a dictionary that uses the JavaScript prototype chain to capture the notion of hierarchical property lookup – the idea that you want to keep searching up the chain of contexts until you find a value. (If you've ever played with Clojure or another language that supports persistent data structures, they do something similar, only every operation that changes a value in the structure produces a new layer of wrapping, rather than the direct mutation CLS uses.) As an aside, I apologize to everyone for lapsing into what must seem like nonsense periodically -- this is an abstract feature, and I'm finding it a constant struggle to describe concretely, even though my use case is super concrete. Thanks for bearing with me! |
Add support for timers and process.nextTick, lock down attributes added to `process`, bootstrap tests and documentation.
I did some substantial reworking of the documentation, and I think the new version of the docs does a much better job of describing what this thing is and why a normal person might want to use it. Further review appreciated. |
Reserve slots on the process object at process startup to prevent later deoptimization by the JIT.
Continuation-local storage works like thread-local storage in threaded | ||
programming, but is based on chains of Node-style callbacks instead of threads. | ||
The standard Node convention of functions calling functions is very similar to | ||
something called ["continuation-passing style"][cps] in functional programming, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: CPS is not restricted to functional languages (and I don't think it's widely used in FP outside of compilers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until Node, I don't think that CPS was widely used anywhere besides as a mechanical transformation applied to FP source to turn it into imperative code. I can clarify in the doc, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation looks better, thanks. I agree with @trevnorris that if the ultimate goal is to make application tracing easier, then there are better and more comprehensive solutions possible. As a new per-context dictionary feature I'm +0 on this PR. I don't really object but it's somewhat debatable if it should be a new core module. The guideline is "if it doesn't have to be in core, it shouldn't". I don't think a core module is the only way to accomplish what continuation_local_storage.js does but if you can convince me of that, then fine. That said, that +0 will turn into -1 if it turns out there are performance implications. Trevor already mentioned it but I get a lot of complaints from people about how when you load one module that uses domains, your whole application pays the performance penalty. We're stuck with domains but if I can prevent the introduction of another module with the same issues, I will. |
As I said on #libuv, I care about the result, not the means of getting it, so if you can describe to me an OS-neutral, easily deployable solution that people can run in production for extended periods of time, I promise I will consider it long and hard. That is what New Relic offers for its other language / runtime / platform agents, and that's what we intend to offer on Node. I've been working on this problem for over a year now, and this is the first approach that has yielded usable results. I've already committed to delivering performance on par with domains. That said, at least among New Relic's beta testers, there have been one or two complaints about memory overhead (which are almost certainly due to bugs), and none about CPU overhead, even without any extensive profiling or performance tuning on the agent itself. People apply different criteria when they're concerned about stability and insight into how their apps are performing. As @tjfontaine memorably put it yesterday, there's Performance and "performance." |
👍 👍 👍 👍 This would be a great solution to passing around magic state objects or initializing modules in the scope of a callback chain. |
At an abstract level, all cls, domains, long-stack-traces, and all other related things really need from core is a place to hook into event source creation. eventSourceHook = function (callback) {
// A function was just called (or is being called) that will trigger possible events
// in the future. Examples include setTimeout, fs.open, server.listen, etc..
// The passed in callback will be called for this new event source every time an
// event happens. You can return a new wrapped callback if you wish to
// intercept before or after. This hook function is called in the stack of the
// parent continuation (the code that created the new event source)
} If a stack of hooks was allowed then several independent things like new relic traces and node domains could exist as two different levels. process.addEventSourceHook(domainHook);
process.addEventSourceHook(tracingHook);
// Ignore ugly API names, bikeshed later if concept is good The problem with this historically is it's very general and leaves little room for optimizing since each independent thing needs it's own closure. If anyone knows a way to make this fast, I would highly recommend it as the best thing for core to expose. But after thinking about it, I think this wouldn't be any slower than what domains do or what the proposed patch does. With this API in core, CLS could be easily implemented in userspace as something like: var namespaces = [];
exports.createNamespace = function () {
namespaces.push(new Namespace());
...
}
...
process.addEventSourceHook(function (original) {
// Get the currently active contexts in all the namespaces.
var length = namespaces.length;
var contexts = new Array(length);
for (var i = 0; i < length; i++) {
contexts[i] = namespaces[i].active;
}
// Return a callback that enters all the saved namespaces when called.
return function () {
for (var i = 0; i < length; i++) {
namespaces[i].enter(contexts[i]);
}
try {
return original.apply(this, arguments);
}
finally {
for (i = 0; i < length; i++) {
namespaces[i].exit(contexts[i]);
}
}
};
}); |
Also now that nextTick semantics make it always run all nextTicks at the end of the current tick, nextTick can also be implemented as an event source hook and not as a real event. This should speed up things considerably for nextTick. The current PR that wraps all of nextTick's callbacks is rather expensive relatively speaking. Here is a sample nextTick implementation as a simple user-space library. var queue = [];
module.exports = function (callback) {
queue.push(callback);
};
require('event-hook')(function (original) {
return function () {
try {
return original.apply(this, arguments);
}
finally {
var callback;
while (callback = queue.shift()) {
callback();
}
}
};
}); |
originally deleted since I hadn't reviewed the code yet, but readded since @creationix replied @creationix With some fixes to domain consistency that my trycatch library provides, we have been using domain.active at LinkedIn as a local storage mechanism for a while now, and we've been playing with the idea of namespaces as you propose here. I like where you're going with this, but I'm curious what the API would be for accessing these namespaces and if they provide anything more than automatic cleanup once a namespace's scope is left (which in itself is very useful)? In other words, would this idea of namespaces provide anything (beside the likely very useful cleanup, which would avoid memory leaks if you ever put a namespace on a long lifetime domain) that you couldn't do manually? // Manual namespaces |
@CrabDude, yes hookit is very close to the API I want in core except I would leave EventEmitter alone and only touch real event sources. For things that are normally exposed to users as events (like net "connection", "connect", "close" events and stream "data", "end" events), There is a lower level callback that gets called from C++ which then dispatches the data to all the emitter listeners. That single callback internally would be the one that gets wrapped by the event source hook. As far as cleanup hooks, that could easily be done by the code that wraps the callbacks in userspace. |
@othiym23 This is great that you've formalized this functionality. I would definitely use this. Having nestable namespaced storage with automatic dereferencing would be very nice. There has been a consistent need for this at LinkedIn, again predominately for logging, though we might consider beyond that if it were robust and reliable. WRT hooking mechanisms in core, I'd like to add my 2¢. I see there being a need for 2 levels of hooking,
On a separate point, functionality like this can and seemingly will always be undermineable by third-party modules. A major offender of this is the mongoDB driver, which will batch requests, resulting in many callbacks being called from a single event source. In cases like these, both forms of wrapping are insufficient, which is arguably why something like this must exist at the control-flow layer because it would be utilized across your own application layer, which is ultimately where access to cls matters. This is exacerbated of course if you do not use your control-flow library consistently across your application code, but if cls were tied to control-flow, it would make sense that you would only be able to access your namespace when access to your control-flow library is available. Not to ramble, but maybe this is precisely why a mechanism for context needs to exist in core, so that there can be an expectation that third-party modules respect (set appropriately) context on application code reentry (callbacks) similar to the existing (or soon to be) expectation that they properly reset the domain, or maybe application code should be manually maintaining context across callbacks similar to how domains occasionally require the same. |
@trevnorris has started sketching out an async hook that looks a lot like what @creationix described earlier in #6011. In principle, New Relic could use that "event source listener" as the glue logic it needs to leave the rest of CLS in userland (assuming the things I mention on that PR eventually get added). |
What's the status of this? Would it be possible to instead use this as an opportunity to expose better hooks from core so this can be done in userland without too much additional cost? It'd be a shame to have this fixed in stone for 1.0. |
ah, that's a bit nicer. |
This is superseded by #6011, with a user land version of CLS that uses either the asyncListener API or a polyfill that @creationix and I put together. |
So this can be closed right? |
Yup. thanks. |
This module adds continuation-local storage that works analogously to the thread-local variables used in multithreaded programming. It allows you to set variables specific to a particular request or handler execution, and can be used to create things like pure-JS execution tracers and long stacktrace generators, and to replace the grotesque hack of stashing per-request state on
req
andres
. It is also a more generalized version ofdomains
, and is intended, eventually, to be used as the mechanism by which domains are implemented.This is the first part of an implementation of the API described on #5243 & #3733. There is also a userland implementation of the API and another module that monkeypatches the API onto 0.10. @creationix and I have been hacking on this for a few months now and feel pretty good about the core API.
So far, this PR includes the core API (with its documentation) as well as a sort of hacky implementation that applies over
process.nextTick
and the timer functions (setImmediate
,setTimeout
, andsetInterval
). The next piece of work @creationix and I intend to tackle is dealing with Make(Domain)Callback and net.Socket. The final step is to rework domains to run on top of this and to take advantage, as much as possible, of the optimization work that @trevnorris has already put into domains.All of the tests pass and there is a first pass of the documentations, but I haven't done any benchmarking yet. Style and performance review on what's there so far would be great. I know this thing is wacky and may be coming out of nowhere for some of you, but @isaacs and I have been talking about this for a while and I think we're agreed that it's useful (and low-level) enough to belong in core.