-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-261 #120
ZOOKEEPER-261 #120
Conversation
Nit: the PR title should contain the string "ZOOKEEPER-261" for the merge script to work - otherwise comments made here will not be bridged to JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-261 |
Thanks, @hanm , let's see if editing the correct string into the title suffices or if I need to open up a new PR. |
File initFile = new File(dataDir.getParent(), "initialize"); | ||
if (initFile.exists()) { | ||
if (!initFile.delete()) { | ||
LOG.warn("Unable to delete initialization file " + initFile.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds pretty serious issue if the initialize file can't be cleaned up upon startup of a new ensemble for whatever reasons as the presence of this file is the key promise made here - not able to clean it up will possibly lead to inconsistent quorum state again that this PR is trying to fix. So, maybe throw an IOException here to abort server start process and let admin intervene instead?
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { | |||
|
|||
txnLog = new FileTxnLog(this.dataDir); | |||
snapLog = new FileSnap(this.snapDir); | |||
|
|||
autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this variable autoCreateDB
(and the property ZOOKEEPER_DB_AUTOCREATE_DEFAULT
and ZOOKEEPER_DB_AUTOCREATE
) is used solely for testing purpose (to change control flow and get code coverage). IIUC, maybe add some comments about these test only variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included ZOOKEEPER_DB_AUTOCREATE
to allow users to opt out of the feature until they're ready to update their ensemble management tooling to support creating the new file. Is that in accord with Zookeeper style?
On the question of style, ZOOKEEPER_DB_AUTOCREATE_DEFAULT
exists purely because ZOOKEEPER_DATADIR_AUTOCREATE_DEFAULT
exists above it in the file. If including the defaults as static constants isn't Zookeeper style then I'm happy to replace it with a string literal in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that in accord with Zookeeper style?
I see - I thought the new property was not end user facing since there is no associated documents added here. Since the property "zookeeper.db.autocreate" is exposed to user some doc could be added to ZooKeeperAdmin.html (similarly like how the existing "zookeeper.datadir.autocreate" is documented there) to describe the motivation / usage of the property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @hanm
|
lgtm +1 |
Rebased on to latest master to avoid any potential conflicts with @breed 's changes for 2325. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @enixon,
I think your approach is very cool, for real. I only had time to give a first pass on your patch now (hope to look closer soon, esp. the tests), but I would like to ask a dumb question.
What if we change the approach and, instead of the initialize
file being used for normal execution, we use a recover
(or rejoin
) file whose presence denote an exceptional restart of a ZK node? That way, if and only if, this file is present we delete it and return -1L so that it cannot take part in the elections until it catches up with the ensemble, etc.
If this file is not present then we proceed as usual (i.e. returns 0L). This way, we are dealing with the exceptional case by using the initialize
/recover
. For example: node C (from a 3 node ensemble) crashes due to disk full exceptions. Then the operator delete the data/ directory and put the recovering
file there.
In my humble (and naive) option, it would avoid some headaches for ops people who would forget to include the initialize
file in a node or two, during rolling upgrades or other cases I can't think of right now. The presence of this file for normal execution changes the ordinal operation of a ZK node. So, we don't have to deal with changing the standard way of starting a ZK node. The recover
file is for exceptional cases, where we want to make sure the restarting node cannot take part in an election.
PS: I didn't get the autocreateDB
stuff also. But it's late at night here. 😄
Wdyt?
@@ -113,6 +113,8 @@ initialize() { | |||
else | |||
echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid if using non-standalone" | |||
fi | |||
|
|||
date > "$ZOO_DATADIR/initialize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If the sole purpose of this file is to act as a marker, in spite of its content, then a
touch $ZOO_DATADIR/initialize
would be enough, wouldn't it?
Of course, date
is fine as well, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True enough, touch
is sufficient. Using date
is an optimization I've included in other scripts in the past as a way of sneaking a bit more information into an otherwise meaningless file but in this context it's probably just confusing.
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map<Long, Integer> sessions, | |||
PlayBackListener listener) throws IOException { | |||
long deserializeResult = snapLog.deserialize(dt, sessions); | |||
FileTxnLog txnLog = new FileTxnLog(dataDir); | |||
boolean suspectEmptyDB; | |||
File initFile = new File(dataDir.getParent(), "initialize"); | |||
if (initFile.exists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Java 7 is the default we could use the code below? The benefits are that it automatically throws the IOException
if an I/O error happens or return false
if the file doesn't exists.
if (Files.deleteIfExists(initFile.toPath()) {
suspectEmptyDB = false;
} else {
(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I am not used to Files
class so you may have to make sure it doesn't alter the current behaviour if you decide to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization, I like it!
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { | |||
|
|||
txnLog = new FileTxnLog(this.dataDir); | |||
snapLog = new FileSnap(this.snapDir); | |||
|
|||
autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @hanm
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map<Long, Integer> sessions, | |||
PlayBackListener listener) throws IOException { | |||
long deserializeResult = snapLog.deserialize(dt, sessions); | |||
FileTxnLog txnLog = new FileTxnLog(dataDir); | |||
boolean suspectEmptyDB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename this to recoveringDB
or recoveringNode
?
My rationale is: suspectEmptyDB
looks vague to me, plus if I understood it right a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right?
Could we rename this to recoveringDB
or recoveringNode
?
My rationale is: suspectEmptyDB
looks vague to me, plus because if I understood it right a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If suspectEmptyDB
looks vague to you, that's enough reason to change it. The intent is to capture "if you find can't find a data base then treat that condition with paranoia and don't vote until someone provides you with one". The recovery aspect could be emphasized with forceRecoverEmptyDB
. Or we could invert the condition and call it trustEmptyDB
.
} | ||
suspectEmptyDB = false; | ||
} else { | ||
suspectEmptyDB = !autoCreateDB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be nice to put a debug
(warn?) log message here. Something along the lines of "Initialize file doesn't found! Using autoCreateDB attribute." with the value of suspectEmptyDB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tempted to do put the log line on the other side of the conditional since this side is the expected case. We should only delete an initialize file once in the lifecycle of a given server while the check against autoCreateDB
will happen every other time the server is restarted.
|
||
if (suspectEmptyDB) { | ||
/* return a zxid of -1, since we are possibly missing data */ | ||
LOG.warn("Unexpected empty data tree, setting zxid to -1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we 100% sure the data tree is empty? Couldn't it be only partially complete? I mean the machine recorded up to transaction n, but lost transactions n+1, n+2, n+3, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only reach this point if our SnapShot
interface was not able to deserialize anything into the DataTree
. In the usual case of FileSnap
, this happens only when there aren't any non-corrupted snapshots to be loaded.
@eribeiro Thanks for checking the diff! The I would be hesitant to rely on the existence of any file to provide guarantees during regular execution and that precludes the use of |
@eribeiro i think this is ready to go. do you feel that your question has been answer sufficiently? |
commited. thanx everyone for reviewing and brian for your contribution. |
hmm this is committed. anyone understands why it doesn't autoclose? |
I don't see this is committed in master. No commit log from git, no notification emails. @breed Are you sure this is committed (and pushed to apache git)? |
Also https://issues.apache.org/jira/browse/ZOOKEEPER-261 does not have Fixed Version set, which implies that the JIRA was resolved manually (my guess) instead of automatically as part of commit flow through https://github.com/apache/zookeeper/blob/master/zk-merge-pr.py - usually if we use the merge script it will take care everything including close the PR and resolve the JIRA (provided right credential has been set up.). |
hmm, perhaps you are right. i don't think the script is working for me properly... checking. |
it shows up in https://git-wip-us.apache.org/repos/asf?p=zookeeper.git |
Right, it is committed, confirmed. I was only checking the apache mirror on git (https://github.com/apache/zookeeper), instead of the apache git directly. I suspect there is some infra issues on Apache (the JIRA was done yesterday) which contributes to the bridging between various systems not working as expected, leading to this PR not closed automatically. |
As a side effect of a new commit (42c75b5) that triggers git mirror sync, this PR is closed :) |
… election Author: Brian Nixon <nixon@fb.com> Reviewers: Michael Han <hanm@apache.org>, Edward Ribeiro <edward.ribeiro@gmail.com>, benjamin reed <breed@apache.org> Closes apache#120 from enixon/ZOOKEEPER-261
… election Author: Brian Nixon <nixon@fb.com> Reviewers: Michael Han <hanm@apache.org>, Edward Ribeiro <edward.ribeiro@gmail.com>, benjamin reed <breed@apache.org> Closes apache#120 from enixon/ZOOKEEPER-261
No description provided.