add check before accessing activeQuery #961

spollack · 2016-03-07T21:47:01Z

addresses #949

brianc · 2016-03-07T21:52:58Z

@spollack I'm so sorry you were having crashes in production after upgrading. 😦 never ever ever fun to deal with.

I want to merge this as soon as I possibly can & get this pushed as a new patch version. Before I do that could I trouble you for a bit more information on how this happens? Do you have any steps to reproduce it? I can't fathom a time the activeQuery would go null before the query response comes in - and all the unit tests are still passing so...it's strange. Ultimately I'd like to get this merged with a failing test case that passes after the code changes, because my guess is if this happens and the null check bypasses any message handling in the client without an active query your client query end callbacks are never going to fire - and I would guess your app would eventually hang waiting for callbacks which never complete.

spollack · 2016-03-07T23:42:25Z

@brianc yes i totally agree, i'd like to understand this better too. i've tried a bit today to repro this without luck so far. Let me do some more digging.

spollack · 2016-03-08T01:02:28Z

I reviewed the logs from all 12 cases in production where this crashed on us (pretty much all while i was away on vacation last week, doah!). In every case, it was correlated with a query being cancelled on the backend due to a statement timeout. however, the inverse was not always true -- many other statements that were cancelled due to statement timeout did not result in a crash. So it is a more complex interaction. I also tried writing a simple integration test that sets the statement timeout, then does pg_sleep to hit it, and (not surprisingly) this didn't repro it by itself. So, no smoking gun yet, but @brianc i'm wondering if that triggers any ideas for you on how we could get into this state.

spollack · 2016-03-08T01:22:09Z

starting at the pg code, i see something suspicious. look here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L170 and you'll see that we do:

self.activeQuery = null;
return activeQuery.handleError(error, con);

i.e. we null out activeQuery before doing handleError. inside handleError, we potentially do more work on the connection here https://github.com/brianc/node-postgres/blob/master/lib/query.js#L97

connection.sync();

and perhaps this would cause the command complete handler to then fire, which is where we see the crash here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L123

self.activeQuery.handleCommandComplete(msg, con);

@brianc thoughts? i don't know the insides of pg nearly as well as i would like to. Thanks!

spollack · 2016-03-11T19:18:59Z

another possibility here is that it is a race coming from the postgres side -- that we could sometimes be receiving a commandComplete message on the connection after receiving an error message. @brianc thoughts?

…ery-guard

brianc · 2016-03-30T17:21:16Z

@spollack sorry for taking a while to get back to you - been really busy w/ life stuff.

starting at the pg code, i see something suspicious. look here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L170 and you'll see that we do:

self.activeQuery = null;
return activeQuery.handleError(error, con);
i.e. we null out activeQuery before doing handleError. inside handleError, we potentially do more work on the connection here https://github.com/brianc/node-postgres/blob/master/lib/query.js#L97

connection.sync();
and perhaps this would cause the command complete handler to then fire, which is where we see the crash here https://github.com/brianc/node-postgres/blob/master/lib/client.js#L123

self.activeQuery.handleCommandComplete(msg, con);
@brianc thoughts? i don't know the insides of pg nearly as well as i would like to. Thanks!

I think it might make sense to null out the active query after calling activeQuery.handleError...like so:

var result = activeQuery.handleError(error, con);
self.activeQuery = null;
return result;

Because you're right...there might be more things going on in there? (unlikely, but worth checking) - have you tried just this modification to the code & see if it fixes your issue?

spollack · 2016-03-30T20:24:51Z

Thanks Brian. we have not tried just that modification. I wish we had a clear repro case! We have been running with this PR in production for several weeks now, and have not had this particular issue anymore.

spollack · 2016-06-06T21:59:46Z

@jshepard dug into this more, and it appears that this problem was likely caused by an issue in another library we were using on top of pg. given that, closing this.

add check before accessing activeQuery

d14700e

jshepard and others added 2 commits March 28, 2016 12:29

add check before accessing activeQuery

e8f4afb

Merge remote-tracking branch 'origin/activeQuery-guard' into activeQu…

1286ff6

…ery-guard

spollack mentioned this pull request Apr 4, 2016

rowCount not always equal to rows.length #979

Open

spollack closed this Jun 6, 2016

vedant15 mentioned this pull request Aug 30, 2017

TypeError: Cannot read property 'name' of null #1105

Closed

Developerarif2 mentioned this pull request Aug 13, 2023

rowCount not always equal to rows.length johnfrench3/node-postgres-repos#31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add check before accessing activeQuery #961

add check before accessing activeQuery #961

spollack commented Mar 7, 2016

brianc commented Mar 7, 2016

spollack commented Mar 7, 2016

spollack commented Mar 8, 2016

spollack commented Mar 8, 2016

spollack commented Mar 11, 2016

brianc commented Mar 30, 2016

spollack commented Mar 30, 2016

spollack commented Jun 6, 2016

add check before accessing activeQuery #961

add check before accessing activeQuery #961

Conversation

spollack commented Mar 7, 2016

brianc commented Mar 7, 2016

spollack commented Mar 7, 2016

spollack commented Mar 8, 2016

spollack commented Mar 8, 2016

spollack commented Mar 11, 2016

brianc commented Mar 30, 2016

spollack commented Mar 30, 2016

spollack commented Jun 6, 2016