Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

ksauzz · 2016-03-28T02:53:00Z

Case1:

If the sink node is overloaded and unresponsive, heartbeat timeout happens on the source node. After the timeout, the source node terminate the connection, retry to connect to the same sink node, and send messages again. But in most case, the sink node is still overloaded under the high workload. So I guess the source node should wait for resolving the high workload on the sink node, or re-connect to other sink node.

Case2:

If the sink node fails due to silent failure, the source node detect a problem by heartbeat timeout. But the source node retry to connect to the same sink node, and wait for establishing the connection forever. It looks like providing connection timeout could help this problem.

It's easy to replicate the problem by kill -SIGSTOP <sink node's pid>.

The text was updated successfully, but these errors were encountered:

binarytemple-external · 2017-04-19T14:16:57Z

Would be great if there were some code path provided here rather than a broad description, event a list of module names would be of help. Investigating a similar issue here.

Basho-JIRA changed the title ~~Improve to handle the situation when the sink node is unresponsive on realtime repl.~~ Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] Mar 28, 2016

Basho-JIRA added the JIRA: To Do label Mar 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

ksauzz commented Mar 28, 2016

binarytemple-external commented Apr 19, 2017

Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

Comments

ksauzz commented Mar 28, 2016

Case1:

Case2:

binarytemple-external commented Apr 19, 2017