Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AddRef() usage for CNode #8372

Closed
wants to merge 1 commit into from
Closed

Fix AddRef() usage for CNode #8372

wants to merge 1 commit into from

Conversation

UdjinM6
Copy link
Contributor

@UdjinM6 UdjinM6 commented Jul 19, 2016

Using AddRef() in ConnectNode() for existing connections doesn't feel right imo considering how refs are released in ThreadSocketHandler() https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L1113. I guess this could be the reason that sometimes refs stay >0 no matter what and nodes stuck in vNodesDisconnected forever which means that node never get deleted and FinalizeNode signal is never fired which in its turn means that for example mapBlocksInFlight can't be cleaned properly and #7596 happens.

This commit should solve the issue by:

  1. removing AddRef() for existing connections
  2. adding AddRef() in CNode's constructor using the same conditions as in ThreadSocketHandler()

@sdaftuar
Copy link
Member

sdaftuar commented Aug 5, 2016

Interesting observation! Thanks for working on this -- I've wondered if we had this type of bug on a few occasions but was never able to pin it down. And I do think you're on to something, as your PR has made clear at least one situation where I can see that we might end up with extra references to the same node: I believe it can be reproduced by eg using -connect=<ip> and -connect=<hostname> on the command line, where hostname resolves to the same ip address (I added a LogPrintf() to FinalizeNode and observed that it doesn't seem to get called after disconnecting, eg via setban).

Have you worked out how this extra reference might occur in typical node usage (or better yet, are you able to reproduce this in testing)? So far the only smoking gun I see is where we're connecting to a peer by address name rather than IP, as I described above, but I don't know if that would be something that would commonly affect a node.

The only other thing I can think of is if there might be a race condition that could cause this -- eg in OpenNetworkConnections, is it possible that we might call ConnectNode() at:

CNode* pnode = ConnectNode(addrConnect, pszDest, fCountFailure);
,
after failing to find the ip address we're about to connect to in the call to FindNode at
FindNode((CNetAddr)addrConnect) || CNode::IsBanned(addrConnect) ||
, but somehow in between that peer has connected to us, and thus it's found when we call FindNode at line
CNode* pnode = FindNode((CService)addrConnect);
?

I don't know if this is possible or if there's something in place already to prevent that... If it is, then perhaps in addition to cleaning up the reference counting, we should also be re-thinking locking here.

Also ping @cfields in case this code is already set for demolition as part of the net refactor.

@theuni
Copy link
Member

theuni commented Aug 5, 2016

@sdaftuar I believe #8085 contains some of those locking fixes, though more are needed. I'll be rebasing that and begging for review after #8128 goes in. As a follow-up (or maybe it's in that branch?) CNodes move to shared_ptrs (which never leave CConnman iirc), so that we don't have to worry about manual refcounting.

Not opposed to fixing this up as-is, but yes, this logic is set for removal.

@UdjinM6 UdjinM6 closed this Aug 28, 2016
@UdjinM6 UdjinM6 deleted the fixNodeRefCount branch August 28, 2016 12:22
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants