-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add check before accessing activeQuery #961
Conversation
@spollack I'm so sorry you were having crashes in production after upgrading. 😦 never ever ever fun to deal with. I want to merge this as soon as I possibly can & get this pushed as a new patch version. Before I do that could I trouble you for a bit more information on how this happens? Do you have any steps to reproduce it? I can't fathom a time the activeQuery would go null before the query response comes in - and all the unit tests are still passing so...it's strange. Ultimately I'd like to get this merged with a failing test case that passes after the code changes, because my guess is if this happens and the null check bypasses any message handling in the client without an active query your client |
@brianc yes i totally agree, i'd like to understand this better too. i've tried a bit today to repro this without luck so far. Let me do some more digging. |
I reviewed the logs from all 12 cases in production where this crashed on us (pretty much all while i was away on vacation last week, doah!). In every case, it was correlated with a query being cancelled on the backend due to a statement timeout. however, the inverse was not always true -- many other statements that were cancelled due to statement timeout did not result in a crash. So it is a more complex interaction. I also tried writing a simple integration test that sets the statement timeout, then does pg_sleep to hit it, and (not surprisingly) this didn't repro it by itself. So, no smoking gun yet, but @brianc i'm wondering if that triggers any ideas for you on how we could get into this state. |
starting at the pg code, i see something suspicious. look here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L170 and you'll see that we do:
i.e. we null out activeQuery before doing handleError. inside handleError, we potentially do more work on the connection here https://github.com/brianc/node-postgres/blob/master/lib/query.js#L97
and perhaps this would cause the command complete handler to then fire, which is where we see the crash here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L123
@brianc thoughts? i don't know the insides of pg nearly as well as i would like to. Thanks! |
another possibility here is that it is a race coming from the postgres side -- that we could sometimes be receiving a commandComplete message on the connection after receiving an error message. @brianc thoughts? |
@spollack sorry for taking a while to get back to you - been really busy w/ life stuff.
I think it might make sense to null out the active query after calling var result = activeQuery.handleError(error, con);
self.activeQuery = null;
return result; Because you're right...there might be more things going on in there? (unlikely, but worth checking) - have you tried just this modification to the code & see if it fixes your issue? |
Thanks Brian. we have not tried just that modification. I wish we had a clear repro case! We have been running with this PR in production for several weeks now, and have not had this particular issue anymore. |
@jshepard dug into this more, and it appears that this problem was likely caused by an issue in another library we were using on top of pg. given that, closing this. |
addresses #949