You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the sink node is overloaded and unresponsive, heartbeat timeout happens on the source node. After the timeout, the source node terminate the connection, retry to connect to the same sink node, and send messages again. But in most case, the sink node is still overloaded under the high workload. So I guess the source node should wait for resolving the high workload on the sink node, or re-connect to other sink node.
Case2:
If the sink node fails due to silent failure, the source node detect a problem by heartbeat timeout. But the source node retry to connect to the same sink node, and wait for establishing the connection forever. It looks like providing connection timeout could help this problem.
It's easy to replicate the problem by kill -SIGSTOP <sink node's pid>.
The text was updated successfully, but these errors were encountered:
Basho-JIRA
changed the title
Improve to handle the situation when the sink node is unresponsive on realtime repl.
Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476]
Mar 28, 2016
Would be great if there were some code path provided here rather than a broad description, event a list of module names would be of help. Investigating a similar issue here.
Case1:
If the sink node is overloaded and unresponsive, heartbeat timeout happens on the source node. After the timeout, the source node terminate the connection, retry to connect to the same sink node, and send messages again. But in most case, the sink node is still overloaded under the high workload. So I guess the source node should wait for resolving the high workload on the sink node, or re-connect to other sink node.
Case2:
If the sink node fails due to silent failure, the source node detect a problem by heartbeat timeout. But the source node retry to connect to the same sink node, and wait for establishing the connection forever. It looks like providing connection timeout could help this problem.
It's easy to replicate the problem by
kill -SIGSTOP <sink node's pid>
.The text was updated successfully, but these errors were encountered: