-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apollo-engine-reporting] : willResolveField called after stopTiming! #4472
Comments
Facing the same error. Attaching complete stack trace: |
This caused an alert for us this morning as every node spewed stacktraces into STDERR. Following apollo-engine 502s doing the same thing yesterday. Is this a client-facing error or just with engine reporting? Can we customise how these errors are logged by apollo-server? |
This error started showing up for me after upgrading from apollo-server 3.4.1 to 3.6.1 |
I wonder if this is related to #5372 |
My team also started seeing these unhandled promise rejections in our Fastify +
For now, we've suppressed the unhandled rejections by building a wrapper plugin around the
We haven't seen an unhandled rejection since deploying this wrapper plugin to our production environment. However, this isn't an ideal solution. I've put together a sample repo that shows how to reproduce the condition above where https://github.com/chrskrchr/apollo-server-race-condition Although I've been able to reliably reproduce this |
@chrskrchr Thanks for the reproduction! I'm looking into this now. FYI, your package-lock file tries to install all packages from some sort of private "artifactory". I'm just deleting the package-lock and reinstalling from scratch, so hopefully I'll get close enough versions to reproduce. In the future when making otherwise excellent reproductions, watch out for that :) |
Here's a PR against your repo that makes the reproduction more reproducible: chrskrchr/apollo-server-race-condition#1 I think this is related to how graphql-js execution handles the bubbling-up of errors/nulls when there is other stuff running in parallel below it. I want to investigate the graphql-js execution code a bit more to see if what's going on is obvious. This does raise some questions about the "right" way to handle this case. |
OK, it looks pretty clear that graphql-js doesn't proactively short-circuit executions when they are guaranteed to end up ignored by a null-bubbling, as long as the execution actually started due to there being Promises involved. I believe what’s going on here is: let’s say you have This is a bit of a funky situation. I'm not sure that there's an easy way without hacking into the graphql-js executor code to actually wait until execution even of deep sub-tree stuff is done before finalizing the trace. But it also feels like traces/field execution counts/etc should in fact care about fields that are executed even if their output ends up not making it into the result! So I'm at least slightly hesitant about entirely ignoring reporting these fields. But the current logged error is not helpful, for sure. |
Doh! The classic "works on my machine". Thanks for fixing that and making the repro deterministic. The previous non-determinism was left over from when I was trying to fuzz test my way into the unhandled promise rejection scenario. It's easy enough to reproduce the It sounds like you have a firm grasp on what's triggering the internal |
Hmm, I don't think in my reproduction I had seen the server restarts. Let me go back to your original reproduction (modulo package-lock) and see if I can see that. |
Or hmm, I guess I wouldn't see the server restarts in your reproduction because there's no usage reporting plugin. I'd assume that the equivalent of the server restarts would be triggered if you add a |
I just realized I was seeing the usage reporting plugin error thrown when reproducing the issue locally, but only because I have an
I also just realized the repro repo was missing another key component - a Even still, we haven't been able to reliably reproduce the server restarts. We're seeing the I'm going to spend some time today re-visiting my fuzz testing approach to see if we can accidentally back our way into the condition that's resulting in an unhandled promise rejection. |
@loftykhanna @codehimanshu @grrowl @awinograd - I'm curious, were any of you by chance using the (I'm guessing the answer is "no" given that this issue appears to pre-date the creation of that plugin, but thought it was worth asking) |
@chrskrchr Hmm, seems like this might be responsible for an issue like this turning into an unhandledRejection. graphql/graphql-js#3529 |
@glasser - great find! That's exactly the type of issue I went hunting for when we first started seeing this back in March (thinking that surely other folks would have run into this), but I wasn't able to find anything nor could I back my way into the correct order of events to reproduce it. But there it is, test case and all! I'll definitely keep an eye on that PR and take it for a spin when it's merged.
I can live with these other things as long as our process no longer exits unexpectedly. 😄 |
I tried to write a test to validate that my fix (removing the internal error) works. I couldn't actually manage to reproduce the original error being visible — I think it ends up being (appropriately) swallowed since it's treated like an error thrown by a field which has already been ruled to not be part of the returned data. The reproduction I'd done before (based on yours) just involved explicitly logging when the issue occurred. And presumably the reason that you were able to observe it was graphql/graphql-js#3529. But I'll fix it anyway. |
The comment explains this in detail. Basically, this "shouldn't happen" error actually could happen. In theory, the times it could happen are the exact times that the error itself would be swallowed rather than becoming visible... but a graphql-js bug meant that sometimes they would become visible anyway. Fixes #4472.
@glasser - I was able to use the information provided in graphql/graphql-js#3529's unit test to successfully reproduce the case where a thrown https://github.com/chrskrchr/apollo-server-race-condition NOTE: there are some additional fields in the schema that aren't required to reproduce the error, but I was testing a bunch of different error scenarios and decided to leave all those fields in there for now. Check out the request in I need to focus on some other work items today but as soon as possible, I'll test the proposed FWIW - I think your changes in #6398 are also worthwhile, but I'll sleep much better knowing that the root cause for our process exits has been addressed and isn't just waiting to bite us again. 😄 |
Good news! I patched a local copy of |
Great to hear. I hope that the CLA issue is resolved soon; in the meantime perhaps https://www.npmjs.com/package/patch-package is your friend. |
The comment explains this in detail. Basically, this "shouldn't happen" error actually could happen. In theory, the times it could happen are the exact times that the error itself would be swallowed rather than becoming visible... but a graphql-js bug meant that sometimes they would become visible anyway. Fixes #4472.
…te" (#6398) The comment explains this in detail. Basically, this "shouldn't happen" error actually could happen. In theory, the times it could happen are the exact times that the error itself would be swallowed rather than becoming visible... but a graphql-js bug meant that sometimes they would become visible anyway. Fixes #4472.
Fixed in v3.8.2. |
We pushed this change out to production earlier today and saw the number of tracing plugin errors drop to zero: Thanks for the fix, @glasser! (and FWIW, we're still going to pursue that |
We are receiving this error
Error: [internal apollo-server error] willResolveField called after stopTiming!
, this is leading to unhandled rejection and frequent restart of node server.Any insight to debug this or in what situation it usually happens.
"apollo-server": "^2.14.3",
"apollo-engine-reporting": "2.1.0"
How do we pinpoint which resolver is causing this issue as entire stack tree is from graphql and apollo server.
Thanks
The text was updated successfully, but these errors were encountered: