Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-3240: Close socket on Learner shutdown to avoid dangling so… #996

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,7 @@ void followLeader() throws InterruptedException {
}
} catch (Exception e) {
LOG.warn("Exception when following the leader", e);
try {
sock.close();
} catch (IOException e1) {
e1.printStackTrace();
}
closeSocket();

// clear pending revalidations
pendingRevalidations.clear();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,7 @@ public void shutdown() {
self.setZooKeeperServer(null);
self.closeAllConnections();
self.adminServer.setZooKeeperServer(null);
closeSocket();
// shutdown previous zookeeper
if (zk != null) {
zk.shutdown();
Expand All @@ -682,4 +683,14 @@ public void shutdown() {
boolean isRunning() {
return self.isRunning() && zk.isRunning();
}

void closeSocket() {
try {
if (sock != null) {
sock.close();
}
} catch (IOException e) {
LOG.warn("Ignoring error closing connection to leader", e);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -122,11 +122,7 @@ void observeLeader() throws Exception {
}
} catch (Exception e) {
LOG.warn("Exception when observing the leader", e);
try {
sock.close();
} catch (IOException e1) {
e1.printStackTrace();
}
closeSocket();

// clear pending revalidations
pendingRevalidations.clear();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -827,7 +827,10 @@ public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
int leader = servers.findLeader();
Map<Long, Proposal> outstanding = servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
// increase the tick time to delay the leader going to looking
int previousTick = servers.mt[leader].main.quorumPeer.tickTime;
servers.mt[leader].main.quorumPeer.tickTime = LEADER_TIMEOUT_MS;
// let the previous tick on the leader exhaust itself so the new tick time takes effect
Thread.sleep(previousTick);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change related to this patch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes down to timing. This test needs the leader to lag behind the actions taking place on the other servers and this change gives the test the full advantage of the leader's changed tick time. It seems that closing the socket alters the timing such that the stale Leader gets an earlier awareness of its stale status without the change.

LOG.warn("LEADER {}", leader);

for (int i = 0; i < SERVER_COUNT; i++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,49 +159,6 @@ public void testReconfigVersionConditionFails() throws Exception {
ReconfigTest.closeAllHandles(zkArr, zkAdminArr);
}

/*
* Tests that if a quorum of a new config is synced with the leader and a reconfig
* is allowed to start but then the new quorum is lost, the leader will time out and
* we go to leader election.
*/
@Test
public void testLeaderTimesoutOnNewQuorum() throws Exception {
qu = new QuorumUtil(1); // create 3 servers
qu.disableJMXTest = true;
qu.startAll();
ZooKeeper[] zkArr = ReconfigTest.createHandles(qu);
ZooKeeperAdmin[] zkAdminArr = ReconfigTest.createAdminHandles(qu);

List<String> leavingServers = new ArrayList<String>();
leavingServers.add("3");
qu.shutdown(2);
try {
// Since we just shut down server 2, its still considered "synced"
// by the leader, which allows us to start the reconfig
// (PrepRequestProcessor checks that a quorum of the new
// config is synced before starting a reconfig).
// We try to remove server 3, which requires a quorum of {1,2,3}
// (we have that) and of {1,2}, but 2 is down so we won't get a
// quorum of new config ACKs.
zkAdminArr[1].reconfigure(null, leavingServers, null, -1, null);
Assert.fail("Reconfig should have failed since we don't have quorum of new config");
} catch (KeeperException.ConnectionLossException e) {
// We expect leader to lose quorum of proposed config and time out
} catch (Exception e) {
Assert.fail("Should have been ConnectionLossException!");
}

// The leader should time out and remaining servers should go into
// LOOKING state. A new leader won't be established since that
// would require completing the reconfig, which is not possible while
// 2 is down.
Assert.assertEquals(QuorumStats.Provider.LOOKING_STATE,
qu.getPeer(1).peer.getServerState());
Assert.assertEquals(QuorumStats.Provider.LOOKING_STATE,
qu.getPeer(3).peer.getServerState());
ReconfigTest.closeAllHandles(zkArr, zkAdminArr);
}

/*
* Converting an observer into a participant may sometimes fail with a
* NewConfigNoQuorum exception. This test-case demonstrates the scenario.
Expand Down