You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We saw this situation on a producer cluster during scale out, while testing 2DC. It's unclear if this can happen on any cluster or whether it's specific to 2DC enabled cluster:
From master logs:
I0918 18:44:27.375653 24181 async_rpc_tasks.cc:873] Prep Leader step down 1, leader_uuid=7fbbb21dc0a7410e8f1c4fd27ca06556, change_ts_uuid=7fbbb21dc0a7410e8f1c4fd27ca06556
I0918 18:44:27.375659 24181 async_rpc_tasks.cc:902] Stepping down leader 7fbbb21dc0a7410e8f1c4fd27ca06556 for tablet 9db51856aaa740b3a7a15f081e293148
I0918 18:44:27.376495 17304 async_rpc_tasks.cc:922] Leader step down done attempt=1, leader_uuid=7fbbb21dc0a7410e8f1c4fd27ca06556, change_uuid=7fbbb21dc0a7410e8f1c4fd27ca06556, error=code: NOT_THE_LEADER status { code: ILLEGAL_STATE message: "Not currently leader" source_file: "../../src/yb/consensus/raft_consensus.cc" source_line: 634 errors: "\000" }, failed=1, should_remove=0 for tablet 9db51856aaa740b3a7a15f081e293148.
On tablet server:
I0918 18:45:33.423285 24662 raft_consensus.cc:1980] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: Pre-election. Granting vote for candidate a0909fefd2c4480d850441eb521cbca5 in term 6687
I0918 18:45:36.310081 17503 raft_consensus.cc:813] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION...
I0918 18:45:36.310144 17503 raft_consensus.cc:492] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: Fail of leader 7fbbb21dc0a7410e8f1c4fd27ca06556 detected. Triggering leader pre-election, mode=NORMAL_ELECTION
I0918 18:45:36.310163 17503 raft_consensus.cc:2856] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: Snoozing failure detection for 3.109s
I0918 18:45:36.310214 17503 raft_consensus.cc:535] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: Starting pre-election with config: opid_index: -1 peers { permanent_uuid: "1371e236277b4f43a7d1891a3f834b00" member_type: VOTER last_known_private_addr { host: "172.152.53.203" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1c" } } peers { permanent_uuid: "a0909fefd2c4480d850441eb521cbca5" member_type: VOTER last_known_private_addr { host: "172.152.39.249" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1b" } } peers { permanent_uuid: "7fbbb21dc0a7410e8f1c4fd27ca06556" member_type: VOTER last_known_private_addr { host: "172.152.21.168" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a" } }
I0918 18:45:36.310267 17503 leader_election.cc:215] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [CANDIDATE]: Term 6687 pre-election: Requesting vote from peer a0909fefd2c4480d850441eb521cbca5
I0918 18:45:36.310305 17503 leader_election.cc:215] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [CANDIDATE]: Term 6687 pre-election: Requesting vote from peer 7fbbb21dc0a7410e8f1c4fd27ca06556
I0918 18:45:36.524571 27706 raft_consensus.cc:1980] T b22abd1644034a6ebfef0b9099026743 P 1371e236277b4f43a7d1891a3f834b00 [term 6686 FOLLOWER]: Pre-election. Granting vote for candidate a0909fefd2c4480d850441eb521cbca5 in term 6687
I0918 18:45:36.699296 25468 raft_consensus.cc:2446] T 9db51856aaa740b3a7a15f081e293148 P 1371e236277b4f43a7d1891a3f834b00 [term 13185 LEADER]: Leader pre-election vote request: Denying vote to candidate a0909fefd2c4480d850441eb521cbca5 for term 13186 because replica is either leader or believes a valid leader to be alive. Time left: 9223198496.233s
and on another tserver:
I0918 18:47:16.137734 25080 raft_consensus.cc:535] T b22abd1644034a6ebfef0b9099026743 P a0909fefd2c4480d850441eb521cbca5 [term 6688 FOLLOWER]: Starting pre-election with config: opid_index: -1 peers { permanent_uuid: "1371e236277b4f43a7d1891a3f834b00" member_type: VOTER last_known_private_addr { host: "172.152.53.203" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1c" } } peers { permanent_uuid: "a0909fefd2c4480d850441eb521cbca5" member_type: VOTER last_known_private_addr { host: "172.152.39.249" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1b" } } peers { permanent_uuid: "7fbbb21dc0a7410e8f1c4fd27ca06556" member_type: VOTER last_known_private_addr { host: "172.152.21.168" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a" } }
I0918 18:47:16.137768 25080 leader_election.cc:215] T b22abd1644034a6ebfef0b9099026743 P a0909fefd2c4480d850441eb521cbca5 [CANDIDATE]: Term 6689 pre-election: Requesting vote from peer 1371e236277b4f43a7d1891a3f834b00
I0918 18:47:16.137881 25080 leader_election.cc:215] T b22abd1644034a6ebfef0b9099026743 P a0909fefd2c4480d850441eb521cbca5 [CANDIDATE]: Term 6689 pre-election: Requesting vote from peer 7fbbb21dc0a7410e8f1c4fd27ca06556
I0918 18:47:16.482267 25080 raft_consensus.cc:813] T 9db51856aaa740b3a7a15f081e293148 P a0909fefd2c4480d850441eb521cbca5 [term 13193 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION...
I0918 18:47:16.482322 25080 raft_consensus.cc:492] T 9db51856aaa740b3a7a15f081e293148 P a0909fefd2c4480d850441eb521cbca5 [term 13193 FOLLOWER]: Fail of leader 7fbbb21dc0a7410e8f1c4fd27ca06556 detected. Triggering leader pre-election, mode=NORMAL_ELECTION
I0918 18:47:16.482337 25080 raft_consensus.cc:2856] T 9db51856aaa740b3a7a15f081e293148 P a0909fefd2c4480d850441eb521cbca5 [term 13193 FOLLOWER]: Snoozing failure detection for 3.178s
We saw this situation on a producer cluster during scale out, while testing 2DC. It's unclear if this can happen on any cluster or whether it's specific to 2DC enabled cluster:
From master logs:
On tablet server:
and on another tserver:
Attached screenshot showing stuck load balancing state.
The text was updated successfully, but these errors were encountered: