Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-261 #120

Closed
wants to merge 4 commits into from
Closed

ZOOKEEPER-261 #120

wants to merge 4 commits into from

Conversation

enixon
Copy link

@enixon enixon commented Dec 6, 2016

No description provided.

@enixon
Copy link
Author

enixon commented Dec 6, 2016

This pull request is built on top of @breed 's changes for #117

@hanm
Copy link
Contributor

hanm commented Dec 6, 2016

Nit: the PR title should contain the string "ZOOKEEPER-261" for the merge script to work - otherwise comments made here will not be bridged to JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-261

@enixon enixon changed the title Zookeeper 261 ZOOKEEPER-261 Dec 6, 2016
@enixon
Copy link
Author

enixon commented Dec 6, 2016

Thanks, @hanm , let's see if editing the correct string into the title suffices or if I need to open up a new PR.

File initFile = new File(dataDir.getParent(), "initialize");
if (initFile.exists()) {
if (!initFile.delete()) {
LOG.warn("Unable to delete initialization file " + initFile.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds pretty serious issue if the initialize file can't be cleaned up upon startup of a new ensemble for whatever reasons as the presence of this file is the key promise made here - not able to clean it up will possibly lead to inconsistent quorum state again that this PR is trying to fix. So, maybe throw an IOException here to abort server start process and let admin intervene instead?

@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException {

txnLog = new FileTxnLog(this.dataDir);
snapLog = new FileSnap(this.snapDir);

autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this variable autoCreateDB (and the property ZOOKEEPER_DB_AUTOCREATE_DEFAULT and ZOOKEEPER_DB_AUTOCREATE) is used solely for testing purpose (to change control flow and get code coverage). IIUC, maybe add some comments about these test only variables?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included ZOOKEEPER_DB_AUTOCREATE to allow users to opt out of the feature until they're ready to update their ensemble management tooling to support creating the new file. Is that in accord with Zookeeper style?

On the question of style, ZOOKEEPER_DB_AUTOCREATE_DEFAULT exists purely because ZOOKEEPER_DATADIR_AUTOCREATE_DEFAULT exists above it in the file. If including the defaults as static constants isn't Zookeeper style then I'm happy to replace it with a string literal in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that in accord with Zookeeper style?

I see - I thought the new property was not end user facing since there is no associated documents added here. Since the property "zookeeper.db.autocreate" is exposed to user some doc could be added to ZooKeeperAdmin.html (similarly like how the existing "zookeeper.datadir.autocreate" is documented there) to describe the motivation / usage of the property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @hanm

@enixon
Copy link
Author

enixon commented Dec 8, 2016

  • add documentation on 'zookeeper.db.autocreate' to zookeeperAdmin.xml
  • extend bin/zkServer-initialize.sh to create the initialize file
  • treat failure to delete initialization file as fatal, throw IOException instead of logging a warning

@hanm
Copy link
Contributor

hanm commented Dec 9, 2016

lgtm +1

@enixon
Copy link
Author

enixon commented Jan 11, 2017

Rebased on to latest master to avoid any potential conflicts with @breed 's changes for 2325.

Copy link
Contributor

@eribeiro eribeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @enixon,

I think your approach is very cool, for real. I only had time to give a first pass on your patch now (hope to look closer soon, esp. the tests), but I would like to ask a dumb question.

What if we change the approach and, instead of the initialize file being used for normal execution, we use a recover (or rejoin) file whose presence denote an exceptional restart of a ZK node? That way, if and only if, this file is present we delete it and return -1L so that it cannot take part in the elections until it catches up with the ensemble, etc.

If this file is not present then we proceed as usual (i.e. returns 0L). This way, we are dealing with the exceptional case by using the initialize/recover. For example: node C (from a 3 node ensemble) crashes due to disk full exceptions. Then the operator delete the data/ directory and put the recovering file there.

In my humble (and naive) option, it would avoid some headaches for ops people who would forget to include the initialize file in a node or two, during rolling upgrades or other cases I can't think of right now. The presence of this file for normal execution changes the ordinal operation of a ZK node. So, we don't have to deal with changing the standard way of starting a ZK node. The recover file is for exceptional cases, where we want to make sure the restarting node cannot take part in an election.

PS: I didn't get the autocreateDB stuff also. But it's late at night here. 😄

Wdyt?

/cc @hanm @breed @fpj

@@ -113,6 +113,8 @@ initialize() {
else
echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid if using non-standalone"
fi

date > "$ZOO_DATADIR/initialize"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If the sole purpose of this file is to act as a marker, in spite of its content, then a

touch $ZOO_DATADIR/initialize

would be enough, wouldn't it?

Of course, date is fine as well, no problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True enough, touch is sufficient. Using date is an optimization I've included in other scripts in the past as a way of sneaking a bit more information into an otherwise meaningless file but in this context it's probably just confusing.

@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map<Long, Integer> sessions,
PlayBackListener listener) throws IOException {
long deserializeResult = snapLog.deserialize(dt, sessions);
FileTxnLog txnLog = new FileTxnLog(dataDir);
boolean suspectEmptyDB;
File initFile = new File(dataDir.getParent(), "initialize");
if (initFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Java 7 is the default we could use the code below? The benefits are that it automatically throws the IOException if an I/O error happens or return false if the file doesn't exists.

if (Files.deleteIfExists(initFile.toPath()) {
    suspectEmptyDB = false;
} else {
(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I am not used to Files class so you may have to make sure it doesn't alter the current behaviour if you decide to use it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization, I like it!

@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException {

txnLog = new FileTxnLog(this.dataDir);
snapLog = new FileSnap(this.snapDir);

autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @hanm

@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map<Long, Integer> sessions,
PlayBackListener listener) throws IOException {
long deserializeResult = snapLog.deserialize(dt, sessions);
FileTxnLog txnLog = new FileTxnLog(dataDir);
boolean suspectEmptyDB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to recoveringDB or recoveringNode?

My rationale is: suspectEmptyDB looks vague to me, plus if I understood it right a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right?
Could we rename this to recoveringDB or recoveringNode?

My rationale is: suspectEmptyDB looks vague to me, plus because if I understood it right a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If suspectEmptyDB looks vague to you, that's enough reason to change it. The intent is to capture "if you find can't find a data base then treat that condition with paranoia and don't vote until someone provides you with one". The recovery aspect could be emphasized with forceRecoverEmptyDB. Or we could invert the condition and call it trustEmptyDB.

}
suspectEmptyDB = false;
} else {
suspectEmptyDB = !autoCreateDB;
Copy link
Contributor

@eribeiro eribeiro Jan 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be nice to put a debug (warn?) log message here. Something along the lines of "Initialize file doesn't found! Using autoCreateDB attribute." with the value of suspectEmptyDB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tempted to do put the log line on the other side of the conditional since this side is the expected case. We should only delete an initialize file once in the lifecycle of a given server while the check against autoCreateDB will happen every other time the server is restarted.


if (suspectEmptyDB) {
/* return a zxid of -1, since we are possibly missing data */
LOG.warn("Unexpected empty data tree, setting zxid to -1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure the data tree is empty? Couldn't it be only partially complete? I mean the machine recorded up to transaction n, but lost transactions n+1, n+2, n+3, etc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only reach this point if our SnapShot interface was not able to deserialize anything into the DataTree. In the usual case of FileSnap, this happens only when there aren't any non-corrupted snapshots to be loaded.

@enixon
Copy link
Author

enixon commented Jan 12, 2017

@eribeiro Thanks for checking the diff!

The initialize file will not be required for standard restarts. The expected pattern is to create one on each host just before ensemble creation (when the file system is trusted to be the most stable) and ZooKeeper will clean it up once the server has inspected the file system and found no data. Restarting servers should see the data already present and not require the file (or not see the data and require not seeing the file). New servers expanding an existing ensemble probably shouldn't be granted voting privileges until they get a copy of the data. Admins should have scripts available to handle ensemble creation and this represents a minimal extension of that.

I would be hesitant to rely on the existence of any file to provide guarantees during regular execution and that precludes the use of recover file pattern. The category of events that motivate this pull request necessarily include those that would remove any pre-staged recover file. I'm imagining a scenario where your server dies and file system gets wiped then automated restart scripts spring into action to bring the server back up.

@breed
Copy link
Contributor

breed commented Jan 13, 2017

@eribeiro i think this is ready to go. do you feel that your question has been answer sufficiently?

@eribeiro
Copy link
Contributor

eribeiro commented Jan 13, 2017

@enixon Thank you very much for the explanation (and your patience with me, haha)! Makes sense, sorry for my misunderstanding. 😊

@breed Yep, agree. Let's push it forward. 👍

@breed
Copy link
Contributor

breed commented Jan 13, 2017

commited. thanx everyone for reviewing and brian for your contribution.

@breed
Copy link
Contributor

breed commented Jan 13, 2017

hmm this is committed. anyone understands why it doesn't autoclose?

@hanm
Copy link
Contributor

hanm commented Jan 13, 2017

I don't see this is committed in master. No commit log from git, no notification emails. @breed Are you sure this is committed (and pushed to apache git)?

@hanm
Copy link
Contributor

hanm commented Jan 13, 2017

Also https://issues.apache.org/jira/browse/ZOOKEEPER-261 does not have Fixed Version set, which implies that the JIRA was resolved manually (my guess) instead of automatically as part of commit flow through https://github.com/apache/zookeeper/blob/master/zk-merge-pr.py - usually if we use the merge script it will take care everything including close the PR and resolve the JIRA (provided right credential has been set up.).

@breed
Copy link
Contributor

breed commented Jan 13, 2017

hmm, perhaps you are right. i don't think the script is working for me properly... checking.

@breed
Copy link
Contributor

breed commented Jan 13, 2017

@hanm
Copy link
Contributor

hanm commented Jan 13, 2017

Right, it is committed, confirmed. I was only checking the apache mirror on git (https://github.com/apache/zookeeper), instead of the apache git directly.

I suspect there is some infra issues on Apache (the JIRA was done yesterday) which contributes to the bridging between various systems not working as expected, leading to this PR not closed automatically.

@asfgit asfgit closed this in 8773ecb Jan 14, 2017
@hanm
Copy link
Contributor

hanm commented Jan 14, 2017

As a side effect of a new commit (42c75b5) that triggers git mirror sync, this PR is closed :)

@enixon enixon deleted the ZOOKEEPER-261 branch March 21, 2017 17:22
lvfangmin pushed a commit to lvfangmin/zookeeper that referenced this pull request Jun 17, 2018
… election

Author: Brian Nixon <nixon@fb.com>

Reviewers: Michael Han <hanm@apache.org>, Edward Ribeiro <edward.ribeiro@gmail.com>, benjamin reed <breed@apache.org>

Closes apache#120 from enixon/ZOOKEEPER-261
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
… election

Author: Brian Nixon <nixon@fb.com>

Reviewers: Michael Han <hanm@apache.org>, Edward Ribeiro <edward.ribeiro@gmail.com>, benjamin reed <breed@apache.org>

Closes apache#120 from enixon/ZOOKEEPER-261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants