-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors being swallowed? #307
Comments
This is node |
Even though it's not fully baked, I would recommend using domains for error handling in general. The problem with throwing exceptions out of the complete callback is that we've gone back and forth on this a number of times with very odd edge cases on both sides of the coin. See #176 for some examples. Because of the way the SDK is structured with retries, throwing an error in the event loop bubbles errors back up causing weird behaviors. As of 2.x the SDK tries to intelligently detect ReferenceErrors and SyntaxErrors and throw those out. Throwing any exception was problematic because it's not possible to tell if the error originated from the callback or if it was just passed through from a prior step. That said, if you think you can implement a patch that gives us the best of both worlds by allowing errors to be thrown out without causing the SDK to attempt retrying of these errors / other weird behaviors, I would consider merging that. Handling these exceptions has not been an easy problem to solve, so the more help we can get on this, the better. |
It's just very unusual in Node.js land for a client library to completely take over error handling like this. I certainly had no idea what was going on in my scripts for hours - they were just silently hanging and I had no idea why. I'm not sure why you're trying to do anything with thrown exceptions - shouldn't they just be allowed to bubble up like normal (unless the consumer has explicitly wrapped their code in a domain)? I'm guessing this has been discussed elsewhere perhaps? |
Personally I think the best behaviour would be to not doing anything special with exceptions at all - as is pretty standard Node.js practice. There's a good overview of best practices here: http://www.joyent.com/developers/node/design/errors Unless I'm missing something, the only exceptions you should be seeing thrown (apart from the obvious If you can point me to the relevant sections of the I can see on #176 you've said "Bubbling the exception up will actually cause it to get bubbled up send lifecycle event handler, which makes the request get retried" - so what happens if you just don't have this behaviour in the "send lifecycle event handler"? Just retry on callback errors, not thrown exceptions...? It seems very dangerous to be retrying on exceptions at all (ie, there could be resource leakage, etc). Wadya reckon? |
Argh, was just bitten by this again when I did Anyway, I'm obviously flogging a dead horse here – I know you're not 100% happy with the current situation either, but just want to reiterate my strong |
@mhart I definitely agree that this is not ideal behavior, unfortunately, attempting to provide "ideal behavior" in the past has caused regressions and was extremely hard to maintain, since there were two competing issues:
The problem here is it's difficult to differentiate an SDK operational error from a regular operational error. Not all of our operational errors come from a service, some come from a client, and some, due to our pluggable event system, could come from user provided operational errors. In 2.x we made a compromise to throw only known programmer errors out of the events (ReferenceError, TypeError, etc.). We were previously trying to be intelligent about where events can propagate out, but that's where most of the regressions had been coming from. That said, I agree with your pain-- the fact that certain errors can get trapped makes for a bad debugging story, so I'm going to spend some more time to see if I can re-introduce some of the error propagation behavior in terminal events (error, complete, success).
Automatic retry with exponential backoff is one of the main features of the SDK. We definitely want to be retrying certain errors, especially those from services. That said, the SDK attempts to retry on retryable errors only. Those are clearly marked with the .retryable property on the error object-- but in order to support a robust set of third-party plugins, the SDK allows errors to bubble up through callback or thrown errors, because it may be that plugin X throws a retryable error rather than calling the callback (that plugin may be synchronous). Just for reference, the portions of the SDK that handle event logic are in lib/request.js and lib/sequential_executor.js. |
I'm still unclear though... Why are operational errors being thrown? As opposed to just returned in callbacks? |
"it may be that plugin X throws a retryable error"... Well that's just bad practice. Plugin X shouldn't be doing that, and shouldn't be allowed to do that. It should just be throwing for programmer errors. Surely in these cases |
I'm not saying anything new here btw - this has been standard Node.js advice for quite a while now: https://groups.google.com/d/msg/nodejs/1ESsssIxrUU/5abyX25Dv2sJ |
According to the "Error handling in Node.js" document you linked, functions can throw for operational errors:
In the case of a synchronous plugin, that plugin is allowed to throw its errors. Specifically, something like:
The SDK supports both synchronous and asynchronous plugins-- therefore, to follow the correct practices you linked above, we should be handling the programming model that both synchronous and asynchronous plugins might use. For example, a synchronous extractData plugin might be as simple as: request.on('extractData', function(resp) {
resp.data = JSON.parse(resp.httpResponse.body.toString());
}); The above plugin deserializes all data as JSON from the wire. Of course, JSON.parse could throw, in which case the SDK must be able to handle and propagate that error. More importantly, the SDK has to now detect that an error was raised during the request and not emit the 'success' event, but rather, propagate to the 'error' event instead ('complete' always executes, of course). This means at some level the SDK has to be able to catch synchronous errors and propagate them. If the SDK did not propagate this error, we would either have to ignore it or do a hard stop on the request, which would effectively crash the entire program for an operational error-- that IMHO is a much worse programmer experience, especially since that synchronous exception would not be catchable by an application, and something we've gotten reports about before due to similar regressions in this logic. If you have other ideas on how this should behave, feel free to explain, but I have not seen any significant contradictions between the SDKs behavior and the best practices document you linked. |
But request.on('extractData', function(resp) {
try {
resp.data = JSON.parse(resp.httpResponse.body.toString());
} catch (e) {
request.emit('error', e) // Or give it more info
}
}); |
The example above is very much a simplified example. What happens when the case is less well known? request.on('extractData', function(resp) {
resp.data = require('foolibrary')(resp.httpResponse.body.toString());
}); Can foolibrary throw? Maybe. Effectively you are asking all plugin authors to guard all code from any potential operational exceptions-- a list of exceptions which is not always knowable by the author. Note that this could even happen with async examples too. Practically speaking, that means all callbacks would have to be wrapped in a try/catch block just in case something bad happens. It also means that a single misbehaving plugin could break the entire request lifecycle by not properly guarding for synchronous errors. That's the experience we are trying to avoid. This is why the SDK wraps these callbacks on try/catch blocks on the user's behalf and correctly handles propagation in all cases so that extractData never bubbles up a Sidenote: the actual example would be: request.on('extractData', function(resp) {
try {
resp.data = JSON.parse(resp.httpResponse.body.toString());
} catch (e) {
resp.error = e;
}
}); Plugins should never manually emit events. Doing so could lead to the 'error' event being emitted N>1 times for a single response. It also doesn't play nice with the request lifecycle architecture. |
(modified this issue title to remove the domains reference) |
I think it just comes down to me not understanding why The I'm just not sure what case you're trying to solve for here - have you got a more realistic example of the issues you've encountered? |
It is different in that events emitted by the SDK support asynchrony, something EventEmitter does not support. This is necessary to support a whole slew of use cases where plugins might want to modify a request prior to sending. We emit build, sign, and send events, but order matters, and so a plugin that modifies build with an asynchronous call needs a way to ensure that send will wait on this event to finish. The sign event is a great example of this, where we modify the request with an Authorization header that is signed with credentials, but those credentials might need to be sourced from an asynchronous source-- local HTTP for EC2 metadata, or even remote HTTPS for STS credentials. Fundamentally, the request lifecycle in the SDK is architected as more of a state machine than a set of events. Each "state" (build, sign, send, http_, extract_, and terminal states) emits a similarly named event, but the event is only emitted to listeners during that state. Errors that occur in state X need to move forward to the terminal states or they would get lost / thrown synchronously, neither of which is a good experience. The behavior here is done to provide robust support for all of our use cases-- supporting retry logic, asynchronous signing, plugins that modify the way a request can be built / deserialized. These are all behaviors that are exposed in an extensible way for plugin authors to take advantage of. A library like
I'm not entirely sure why it matters what the library is? Is there a different expectation for "well known" modules? I would imagine that
Hopefully the first paragraph above sets the stage a bit more. More realistic examples of the issues can be found right in the SDK itself. The SDK is just a series of plugins, many of which are wired up from event_listeners.js. For example, the actual serializer/deserializer protocols (like XML/JSON support) are just plugins that listen to 'build'/'extractData' events. Those plugins might depend on modules (like xml2js) which could run synchronously or asynchronously, and may or may not throw exceptions. Having to know the entire closure of dependencies that each plugin uses in order to detect whether we need to wrap that plugin in a try/catch block would be an extremely high maintenance overhead, so in order to simplify the experience, the event manager itself handles catching of those uncaught exceptions and moving the state machine to the error state. Hope that explains things a little bit more. I'd be happy to talk more about how (and why) our architecture works the way it does. I touched on it at our re:Invent talk last year, but there's certainly more to talk about. Let me know if you're curious. |
Yeah, so
I'm just having a hard time understanding this. Let me try another tack: If you removed the current exception handling behaviour altogether in |
Finally, by the same argument made in point 2, you could say that since synchronous plugins in the SDK are functions that are known to throw, therefore the SDK itself should be wrapping those calls in a try/catch block. By that interpretation, we are behaving consistently with your advice on "known to throw" functions.
Because of the following best practice:
Since the SDK supports synchronous plugins, we support functions that follow best practices for said synchronous plugins, namely, It seems like if synchronous functions are often known to throw, and the SDK executes arbitrary synchronous functions, then it should be try/catching the exceptions thrown from these functions. Otherwise it may be that a user registered a listener, but the registration of that synchronous listener caused the SDK to fail its contractual obligation of emitting an error event-- is that the plugin's fault, or the SDK's fault? From a "good user experience" point of view, I'd think it's both, which is why we're trying to resolve the issue from our end. |
Perhaps this extra (short) response will help clarify: Because plugins run in the context of the SDK, we consider the SDK to be responsible for the execution of its plugins. Therefore, an unknown operational failure of a plugin should never cause the SDK to break its contract with users, namely (assuming no programming error) always emitting a valid terminal event for every request. |
Well... I find that very unfortunate. I honestly just think you're making your lives a lot more complicated at the end of the day by trying to do things differently to everyone else. Having You cut off my context with the fact that Again, I really think we've got to move away from theoreticals and hypotheticals, it doesn't help much. So – has there actually been a case where a user plugin has caused issue here, or has the architecture been developed just in case that ever happens? If there were some concrete issues that have occurred, I really think it would make it easier to understand rather than arguments that boil down to "we want to save programmers from doing stupid things... at the expense of others". |
Anyway anyway anyway, I've been writing too much on this too, clearly. I think we just disagree with the level of intervention that |
If we want to get away from the broad issue of how to handle errors, and at least just focus on the specific issue I posted about – it boils down to not messing with callbacks that users have passed in. If a user does this: ec2.describeInstances(function() {
throw new Error('wtf')
}) What good reason is there for I think the answer to that (if we keep it specifically about this case) will reveal whether there's a good case to consider this behaviour a bug or not. |
@mhart I don't think the SDK should be trapping that exception. The issue here, and the related bug, is that: ec2.describeInstances(function() {
throw new Error('wtf')
}) Is short-hand for: var req = ec2.describeInstances();
req.on('complete', function() {
throw new Error('wtf')
});
req.send(); The problem is that we have general logic to pass these errors through-- ultimately that error actually gets passed through to the 'uncaughtException' state of the request (an internal state) which we then hand off to domains if they are hooked up. We used to attempt to detect errors raised from terminal events (complete, error, success) and raise those, but that's where we ran into edge cases. It's possible the detection could be improved. I generally agree that we should be doing a better job here, I just want to make sure that we can figure out a stable way to do this that won't cause other regressions. |
Perhaps it's just a white-list vs black-list case. Maybe assume all errors are terminal unless they match a certain condition? (eg, it has a |
I'd also argue that once you're in the |
@mhart I agree with that, which is why we previously threw errors that came from the complete state. Thanks for the feedback. I will be working on this to see if there's a way to fix the issue and will reference this in any commits. |
Great, thanks @lsegal ! |
@mhart try the above commit and see how you like it. |
Cool! So far so good! Everything seems to be working as expected for me... Just hope it's still working for everyone else :-) |
I mean, the only thing that might make people confused is that the top of the stack is not actually where the user threw the error (because it's been caught and rethrown):
But apart from that minor point, all good 👍 |
@mhart I can probably fix the stack trace obfuscation, I just wanted to have them both flowing through the same sync/async code paths for maintenance reasons, but I can likely change that. The try/catch inside of the transition function is mostly a fail safe, so it shouldn't affect any event code. It may not even be needed. |
Actually on second thought, that may not work-- we are always catching that exception by virtue of executing the terminal events through the same SequentialExecutor code, which is where the (initial) catch occurs. Removing that would require running terminal events through a completely different executor, or running the executor in a different context, which we could do, but I'm not sure it's worth the effort. I wonder if it's possible to unroll the stack trace for easier debugging? If so, that would be the easier fix. |
I think you can do it with |
Actually I think I can get this working by moving the logic into the SequentialExecutor. Patch to follow shortly. This will also cover a potential edge case (where a second 'complete' event might wipe out the error from a previous event). |
I backed out some of the code I was working on because the change was far too intrusive and broke the encapsulation of the SeqExecutor. Turns out that Node.js's |
Update: the printing of the If there are no other issues with the above commits, this code should go out with our next release. |
OK, no probs - it shouldn't be a big deal in any case I don't think - more cosmetic than anything |
References: #307, #311, amazon-archives/aws-sdk-js-apis#4
This is in the v2.0.5 release, so I will mark this as closed. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
The following code hangs for me (after outputting
done
):This was working fine (as in, the error would be thrown and the process would exit) last time I checked, which I think was 2.0.0-rc13.
This causes lots of problems for any scripts that are expecting
throw
to exhibit the default Node.js behaviour.The text was updated successfully, but these errors were encountered: