-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/Prepare CConnman for upcoming epoll support #3432
Refactor/Prepare CConnman for upcoming epoll support #3432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, obviously backports scare me, but it doesn't look too bad, like you've said.
Regarding the network side, I don't know too much, so the best I'm able to say is this diff probably doesn't send my private keys to a random server 😂
I will test this later today and report back how it goes
Oh, also, a nit and a thought
nBytes = recv(pnode->hSocket, pchBuf, sizeof(pchBuf), MSG_DONTWAIT); | ||
} | ||
if (nBytes > 0) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, since we moved, we technically should fix formatting
I got two test failure locally, not sure if it's just me. https://gist.github.com/PastaPastaPasta/517ebecd2566fe98e77832bed3cd8c28 |
rerunning sendheaders.py and llmq-chainlocks.py a bunch of times, I'm unable to reproduce the error. |
Also see https://gitlab.com/PastaPastaPasta/dash/pipelines/137287226 where I am getting (different) test failures on both 32 bit and 64 bit |
e97495e
to
a0832e3
Compare
Rebased on develop to fix conflicts. @PastaPastaPasta The sendheaders.py failures are sporadically on develop as well, so these are not related to this PR. Never seen the llmq-chainlocks.py failure however, so unsure if related. If you see it again, can you also upload all logs? One of the failures (p2p-fullblocktest.py) in https://gitlab.com/PastaPastaPasta/dash/pipelines/137287226 is due to out of memory. I can see that you're running it on free Gitlab runners, these are not powerful enough to run so many tests in parallel. The other failure seems to be something different...investigating. |
https://gist.github.com/PastaPastaPasta/763e401c68b56e3a96ecee1b378880fb llmq-chainlocks.py failure |
@PastaPastaPasta I need all debug.log files from all nodes to debug the llmq-chainlocks.py failure. Meanwhile I found the reason for the failure in llmq-simplepose.py and added a commit to fix it. |
43c67fe
to
e944387
Compare
@PastaPastaPasta The llmq-chainlocks.py failures were unrelated and should be fixed via #3435 |
Needs rebase |
e944387
to
a742ce5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
Actually, wait. I have random test failures locally. Will test a bit.
I got an p2p-instantsend.py failure (and an abandonconflict.py failure likely fixed by #3436) see https://gist.github.com/PastaPastaPasta/cb30bb821d36432b455242545e443279. Can provide full datadir if needed. Unable to reproduce this one, might have been caused by my computer timing out of something |
This allows us to check for pending messages without locking cs_vSend
Instead of selecting every socket in every SocketHandler iteration, we will now track which nodes are known to have pending receivable data and/or have empty send buffers. Each time recv fails to fill a whole receive buffer, fHasRecvData is set to false so that the socket is added to the receive select set in the next iteration. When that socket is signalled through select/poll, fHasRecvData is set to true again and remains true until a future recv fails. Each time send fails to send a full message, fCanSendData is set to false so that the socket is added to the send select set in the next iteration. At the same time, nodes which have pending messages to send are tracked in mapNodesWithDataToSend, so that SocketHandler knows for which nodes SocketSendData must be invoked.
a742ce5
to
47af42a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@UdjinM6 do you have logs/datadirs of these failures? Would be interesting to know why a timeout of 1000mn was an issue? |
Not much to see in logs, just a bunch of |
It probably has smth to do with |
This is a set of refactoring which prepare us for the upcoming epoll support. I plan to use epoll in "edge triggered" (vs. level triggered) mode to reduce the number of events as much as possible.
Even though select/poll don't know about the concept of edge/level triggered modes, one can view/compare the behavior of select/poll with level triggered mode. This means, if a socket is given that has pending read data, select/poll will always immediately return even if the data is not fully read from the socket. Same when waiting for sockets to become sendable, select/poll will always return immediately until the socket is not sendable anymore. The old CConnman code relied on this behavior and thus always passed all sockets into select/poll, even if it did not drain the receive buffers.
Edge triggered mode (only epoll supports it) works differently. When it signals that a socket is readable, it won't ever re-signal for that same socket until the receive buffer is drained. Same with sendable sockets. This behavior would immediately lead to epoll blocking forever when one iteration of
SocketHandler()
doesn't drain the read buffer.The refactorings in this PR change
SocketHandler()
and friends to not assume that select/poll would return immediately if the last recv did not drain buffers. This is done by keeping track of which peers are receivable/sendable and then only selecting/polling for peers which were are not receivable/sendable. The receivable/sendable state is then reset accordingly when read buffers get drained or send buffers get full (read()/send() return less then accepted).With these refactorings, implementing epoll support with edge triggered mode gets a no-brainer.