-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #1165 by having threads wait for any outstanding connect to finish. #1170
Conversation
Also noticed that we didn't bump the ref count in the race condition case, so those might have behaved strangely in some rare cases. Pushed a fix for that. |
49a3022
to
5524b95
Compare
5524b95
to
8bca3eb
Compare
That last change makes it so only one thread attempts the connect and everybody else goes based on that. I'll let that bake for a little while (it's more tricky, imho, but not too bad). |
}() | ||
|
||
c, err := p.getNewConn(dc, addr, version) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor, but can we move the code in the defer to after getNewConn
and delete the limiter entry and add the pool entry inside the same critical section? Just to save a defer and another round of locking
var wait chan struct{} | ||
var ok bool | ||
if wait, ok = p.limiter[addr.String()]; !ok { | ||
wait = make(chan struct{}, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to buffer it, since we just close
LGTM! |
Fixes #1165 by having threads wait for any outstanding connect to finish.
Fixes #1165 by having threads wait for any outstanding connect to finish.
This creates a map of channels (per-address) that throttles requests for connections to one at a time (per-address). The other threads will then quickly grab the newly-created connection one at a time and use it.
If the connection they were waiting for fails, they will all try to make a new connection, and the race condition logic for that case was preserved. This seems like the best thing to do in order to move forward, since they all could have been hanging around waiting for that one connection to go. They essentially revert to the behavior we have today, but only in the case where there's network trouble.
I was trying to find a simpler way to handle the case where the connection they are waiting on fails - we could fail all the others immediately but I think the code would be a little more complicated. Thundering a herd at a dead guy seems pretty low-impact.