Need to guard creation of new TCP connections in pool.go/acquire() #1165

slackpad · 2015-08-10T21:07:45Z

While debugging #1154 we noticed that under heavy client request loads, there can be hundreds of calls to pool.go/acquire(). The underlying getNewConn() function will add one of these to the pool in a safe way, causing the others to get canceled, but this is hugely wasteful. In the case of #1154 there were close to 700 useless TCP connections spun up and killed in one instance.

The text was updated successfully, but these errors were encountered:

darron · 2015-08-10T23:14:18Z

Could this also be why sometimes we see nodes that respond to a watch for some KV data - a prefix watch - download some of the content and then are unable to query the KV store and are "locked out".

They don't get all of the data and KV queries are subsequently deaf on those nodes. A restart of the Consul agent on those same nodes usually allows the KV queries to succeed.

NOTE: About 100 JSON fragments underneath the prefix - not even 1MB of data in total - usually only 1 fragment changes at a time.

slackpad · 2015-08-10T23:27:56Z

Only one of these connections "wins" and is actually used to send queries, so nobody gets a dead one or one that gets immediately closed. I suppose the huge amount of network activity related to this could cause packet loss and/or leader slowness and then the subsequent client-side lockup that we were able to produce on #1154 when there are too many streams outstanding in yamux and it starts to block.

slackpad · 2015-08-11T01:46:09Z

Talked about this with @armon - we probably want to lock per-address rather than one big lock for the pool. Otherwise, a slow connect to a remote DC could hold up a connection to a local server.

Fixes #1165 by having threads wait for any outstanding connect to finish.

darron mentioned this issue Aug 10, 2015

Blocking on Catalog HTTP requests. #1154

Closed

armon mentioned this issue Aug 11, 2015

Election triggered every minute #1150

Closed

slackpad closed this as completed in 40c5af6 Aug 13, 2015

slackpad added a commit that referenced this issue Aug 13, 2015

Merge pull request #1170 from hashicorp/b-connection-spam

009f0fb

Fixes #1165 by having threads wait for any outstanding connect to finish.

slackpad added a commit that referenced this issue Aug 13, 2015

Merge pull request #1170 from hashicorp/b-connection-spam

48e6cc4

Fixes #1165 by having threads wait for any outstanding connect to finish.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to guard creation of new TCP connections in pool.go/acquire() #1165

Need to guard creation of new TCP connections in pool.go/acquire() #1165

slackpad commented Aug 10, 2015

darron commented Aug 10, 2015

slackpad commented Aug 10, 2015

slackpad commented Aug 11, 2015

Need to guard creation of new TCP connections in pool.go/acquire() #1165

Need to guard creation of new TCP connections in pool.go/acquire() #1165

Comments

slackpad commented Aug 10, 2015

darron commented Aug 10, 2015

slackpad commented Aug 10, 2015

slackpad commented Aug 11, 2015