Refactor/Prepare CConnman for upcoming epoll support #3432

codablock · 2020-04-17T11:03:31Z

This is a set of refactoring which prepare us for the upcoming epoll support. I plan to use epoll in "edge triggered" (vs. level triggered) mode to reduce the number of events as much as possible.

Even though select/poll don't know about the concept of edge/level triggered modes, one can view/compare the behavior of select/poll with level triggered mode. This means, if a socket is given that has pending read data, select/poll will always immediately return even if the data is not fully read from the socket. Same when waiting for sockets to become sendable, select/poll will always return immediately until the socket is not sendable anymore. The old CConnman code relied on this behavior and thus always passed all sockets into select/poll, even if it did not drain the receive buffers.

Edge triggered mode (only epoll supports it) works differently. When it signals that a socket is readable, it won't ever re-signal for that same socket until the receive buffer is drained. Same with sendable sockets. This behavior would immediately lead to epoll blocking forever when one iteration of SocketHandler() doesn't drain the read buffer.

The refactorings in this PR change SocketHandler() and friends to not assume that select/poll would return immediately if the last recv did not drain buffers. This is done by keeping track of which peers are receivable/sendable and then only selecting/polling for peers which were are not receivable/sendable. The receivable/sendable state is then reset accordingly when read buffers get drained or send buffers get full (read()/send() return less then accepted).

With these refactorings, implementing epoll support with edge triggered mode gets a no-brainer.

PastaPastaPasta

Concept ACK, obviously backports scare me, but it doesn't look too bad, like you've said.

Regarding the network side, I don't know too much, so the best I'm able to say is this diff probably doesn't send my private keys to a random server 😂

I will test this later today and report back how it goes

Oh, also, a nit and a thought

PastaPastaPasta · 2020-04-17T15:29:45Z

src/net.cpp

+        nBytes = recv(pnode->hSocket, pchBuf, sizeof(pchBuf), MSG_DONTWAIT);
+    }
+    if (nBytes > 0)
+    {


nit, since we moved, we technically should fix formatting

src/net.h

PastaPastaPasta · 2020-04-17T17:42:05Z

I got two test failure locally, not sure if it's just me. https://gist.github.com/PastaPastaPasta/517ebecd2566fe98e77832bed3cd8c28

PastaPastaPasta · 2020-04-17T18:07:01Z

rerunning sendheaders.py and llmq-chainlocks.py a bunch of times, I'm unable to reproduce the error.

PastaPastaPasta · 2020-04-17T19:14:28Z

Also see https://gitlab.com/PastaPastaPasta/dash/pipelines/137287226 where I am getting (different) test failures on both 32 bit and 64 bit

codablock · 2020-04-17T20:56:00Z

Rebased on develop to fix conflicts.

@PastaPastaPasta The sendheaders.py failures are sporadically on develop as well, so these are not related to this PR. Never seen the llmq-chainlocks.py failure however, so unsure if related. If you see it again, can you also upload all logs?

One of the failures (p2p-fullblocktest.py) in https://gitlab.com/PastaPastaPasta/dash/pipelines/137287226 is due to out of memory. I can see that you're running it on free Gitlab runners, these are not powerful enough to run so many tests in parallel. The other failure seems to be something different...investigating.

PastaPastaPasta · 2020-04-17T21:03:14Z

https://gist.github.com/PastaPastaPasta/763e401c68b56e3a96ecee1b378880fb llmq-chainlocks.py failure

codablock · 2020-04-17T21:14:19Z

@PastaPastaPasta I need all debug.log files from all nodes to debug the llmq-chainlocks.py failure.

Meanwhile I found the reason for the failure in llmq-simplepose.py and added a commit to fix it.

codablock · 2020-04-17T22:51:34Z

@PastaPastaPasta The llmq-chainlocks.py failures were unrelated and should be fixed via #3435

UdjinM6 · 2020-04-18T10:07:00Z

Needs rebase

UdjinM6

utACK

Actually, wait. I have random test failures locally. Will test a bit.

PastaPastaPasta · 2020-04-18T17:03:55Z

I got an p2p-instantsend.py failure (and an abandonconflict.py failure likely fixed by #3436) see https://gist.github.com/PastaPastaPasta/cb30bb821d36432b455242545e443279. Can provide full datadir if needed. Unable to reproduce this one, might have been caused by my computer timing out of something

This allows us to check for pending messages without locking cs_vSend

Instead of selecting every socket in every SocketHandler iteration, we will now track which nodes are known to have pending receivable data and/or have empty send buffers. Each time recv fails to fill a whole receive buffer, fHasRecvData is set to false so that the socket is added to the receive select set in the next iteration. When that socket is signalled through select/poll, fHasRecvData is set to true again and remains true until a future recv fails. Each time send fails to send a full message, fCanSendData is set to false so that the socket is added to the send select set in the next iteration. At the same time, nodes which have pending messages to send are tracked in mapNodesWithDataToSend, so that SocketHandler knows for which nodes SocketSendData must be invoked.

…hDataToSend

UdjinM6

Looks like it was actually #3433 that cause issues for me and not this PR itself. #3444 should fix that.

ACK

codablock · 2020-04-19T12:06:56Z

@UdjinM6 do you have logs/datadirs of these failures? Would be interesting to know why a timeout of 1000mn was an issue?

UdjinM6 · 2020-04-19T12:15:13Z

Not much to see in logs, just a bunch of socket select error Invalid argument (22) and no connections.

UdjinM6 · 2020-04-19T12:19:03Z

It probably has smth to do with ... kernel scheduling delays mean that the blocking interval may overrun by a small amount. http://man7.org/linux/man-pages/man2/select.2.html and https://github.com/dashpay/dash/blob/develop/src/net.cpp#L93-L94.

codablock added this to the 16 milestone Apr 17, 2020

PastaPastaPasta reviewed Apr 17, 2020

View reviewed changes

codablock force-pushed the pr_refactor_sockethandler branch from e97495e to a0832e3 Compare April 17, 2020 20:14

codablock force-pushed the pr_refactor_sockethandler branch from 43c67fe to e944387 Compare April 17, 2020 21:20

codablock force-pushed the pr_refactor_sockethandler branch from e944387 to a742ce5 Compare April 18, 2020 10:41

UdjinM6 previously approved these changes Apr 18, 2020

View reviewed changes

codablock added 8 commits April 18, 2020 21:05

Make SocketSendData non-const

50fc3c5

Move socket receiving into SocketRecvData

1621e82

Track size of vSendMsg in atomic nSendMsgSize

94fc4fb

This allows us to check for pending messages without locking cs_vSend

Track SOCKET to CNode* mapping

0e8e22a

Still invoke ReleaseNodeVector when interrupted

e263edd

Invoke select/poll with 0 timeout in case we know that there is work

e4be48b

AddRef/Release when adding/erasing CNode* entries to/from mapNodesWit…

47af42a

…hDataToSend

codablock force-pushed the pr_refactor_sockethandler branch from a742ce5 to 47af42a Compare April 18, 2020 19:05

UdjinM6 approved these changes Apr 19, 2020

View reviewed changes

codablock merged commit f5f4ccb into dashpay:develop Apr 19, 2020

codablock deleted the pr_refactor_sockethandler branch April 19, 2020 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/Prepare CConnman for upcoming epoll support #3432

Refactor/Prepare CConnman for upcoming epoll support #3432

codablock commented Apr 17, 2020 •

edited

Loading

PastaPastaPasta left a comment •

edited

Loading

PastaPastaPasta Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

codablock commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

codablock commented Apr 17, 2020

codablock commented Apr 17, 2020

UdjinM6 commented Apr 18, 2020

UdjinM6 left a comment

PastaPastaPasta commented Apr 18, 2020 •

edited

Loading

UdjinM6 left a comment

codablock commented Apr 19, 2020

UdjinM6 commented Apr 19, 2020

UdjinM6 commented Apr 19, 2020

Refactor/Prepare CConnman for upcoming epoll support #3432

Refactor/Prepare CConnman for upcoming epoll support #3432

Conversation

codablock commented Apr 17, 2020 • edited Loading

PastaPastaPasta left a comment • edited Loading

Choose a reason for hiding this comment

PastaPastaPasta Apr 17, 2020

Choose a reason for hiding this comment

PastaPastaPasta commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

codablock commented Apr 17, 2020

PastaPastaPasta commented Apr 17, 2020

codablock commented Apr 17, 2020

codablock commented Apr 17, 2020

UdjinM6 commented Apr 18, 2020

UdjinM6 left a comment

Choose a reason for hiding this comment

PastaPastaPasta commented Apr 18, 2020 • edited Loading

UdjinM6 left a comment

Choose a reason for hiding this comment

codablock commented Apr 19, 2020

UdjinM6 commented Apr 19, 2020

UdjinM6 commented Apr 19, 2020

codablock commented Apr 17, 2020 •

edited

Loading

PastaPastaPasta left a comment •

edited

Loading

PastaPastaPasta commented Apr 18, 2020 •

edited

Loading