Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] #735

Open
ksauzz opened this issue Mar 28, 2016 · 1 comment

Comments

@ksauzz
Copy link

ksauzz commented Mar 28, 2016

Case1:

If the sink node is overloaded and unresponsive, heartbeat timeout happens on the source node. After the timeout, the source node terminate the connection, retry to connect to the same sink node, and send messages again. But in most case, the sink node is still overloaded under the high workload. So I guess the source node should wait for resolving the high workload on the sink node, or re-connect to other sink node.

Case2:

If the sink node fails due to silent failure, the source node detect a problem by heartbeat timeout. But the source node retry to connect to the same sink node, and wait for establishing the connection forever. It looks like providing connection timeout could help this problem.

It's easy to replicate the problem by kill -SIGSTOP <sink node's pid>.

@Basho-JIRA Basho-JIRA changed the title Improve to handle the situation when the sink node is unresponsive on realtime repl. Improve to handle the situation when the sink node is unresponsive on realtime repl. [JIRA: RIAK-2476] Mar 28, 2016
@binarytemple-external
Copy link

Would be great if there were some code path provided here rather than a broad description, event a list of module names would be of help. Investigating a similar issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants