Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cluster cant failover log conditions #780

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -4191,9 +4191,6 @@ int clusterGetReplicaRank(void) {
* 2) Also, the log is emitted again if the primary is still down and
* the reason for not failing over is still the same, but more than
* CLUSTER_CANT_FAILOVER_RELOG_PERIOD seconds elapsed.
* 3) Finally, the function only logs if the replica is down for more than
* five seconds + NODE_TIMEOUT. This way nothing is logged when a
* failover starts in a reasonable time.
*
* The function is called with the reason why the replica can't failover
* which is one of the integer macros CLUSTER_CANT_FAILOVER_*.
Expand All @@ -4202,7 +4199,6 @@ int clusterGetReplicaRank(void) {
void clusterLogCantFailover(int reason) {
char *msg;
static time_t lastlog_time = 0;
mstime_t nolog_fail_time = server.cluster_node_timeout + 5000;

/* Don't log if we have the same reason for some time. */
if (reason == server.cluster->cant_failover_reason &&
Expand All @@ -4211,13 +4207,6 @@ void clusterLogCantFailover(int reason) {

server.cluster->cant_failover_reason = reason;

/* We also don't emit any log if the primary failed no long ago, the
* goal of this function is to log replicas in a stalled condition for
* a long time. */
if (myself->replicaof && nodeFailed(myself->replicaof) &&
(mstime() - myself->replicaof->fail_time) < nolog_fail_time)
return;

switch (reason) {
case CLUSTER_CANT_FAILOVER_DATA_AGE:
msg = "Disconnected from primary for longer than allowed. "
Expand Down
2 changes: 1 addition & 1 deletion src/cluster_legacy.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#define CLUSTER_CANT_FAILOVER_WAITING_DELAY 2
#define CLUSTER_CANT_FAILOVER_EXPIRED 3
#define CLUSTER_CANT_FAILOVER_WAITING_VOTES 4
#define CLUSTER_CANT_FAILOVER_RELOG_PERIOD (10) /* seconds. */
#define CLUSTER_CANT_FAILOVER_RELOG_PERIOD 1 /* seconds. */

/* clusterState todo_before_sleep flags. */
#define CLUSTER_TODO_HANDLE_FAILOVER (1 << 0)
Expand Down
Loading