Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4706: Support 64-bit sequential nodes using zxid #2059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kezhuw
Copy link
Member

@kezhuw kezhuw commented Sep 4, 2023

It is somewhat well-known that ZooKeeper's sequential node number will overflow finally. It is ok for most usages, but it could also be annoying to handle this overflow in client side in certain cases. It would be really nice for ZooKeeper to support 64-bit sequential number to avoid overflow for long running and high frequency usages.

This pr introduces new CreateModes to not break existing usages and uses zxid as the sequence number to not change on-disk wire format.

The consequence is that:

  • Not consecutive. There could be hole in two adjacent sequential number.
  • Fail to create through multi with same prefix path. But multi does not work well with same path modifications anyway. See ZOOKEEPER-4695.

ZooKeeper docs about overflow

When creating a znode you can also request that ZooKeeper append a monotonically increasing counter to the end of path. This counter is unique to the parent znode. The counter has a format of %010d -- that is 10 digits with 0 (zero) padding (the counter is formatted in this way to simplify sorting), i.e. "0000000001". See Queue Recipe for an example use of this feature. Note: the counter used to store the next sequence number is a signed int (4bytes) maintained by the parent node, the counter will overflow when incremented beyond 2147483647 (resulting in a name "-2147483648").

BookKeeper "depends on" this overflow

BookKeeper's LongZkLedgerIdGenerator will fallback to (or rollover) its long version in case of ephemeral sequence overflow.

Blockers

As I documented above, this pr does not work well with multi. I created ZOOKEEPER-4695 months ago in discussion of ZOOKEEPER-4655(#1950). If it is good for us to forbid multiple mutations of one key in multi, I think this pr is possibly promising as it breaks no implicit dependencies and requires no on-disk wire format change.

It is somewhat well-known that ZooKeeper's sequential node number will
overflow finally. It is ok for most usages, but it could also be
annoying to handle this overflow in client side in certain cases. It
would be really nice for ZooKeeper to support 64-bit sequential number
to avoid overflow for long running and high frequency usages.

This pr introduces new `CreateMode`s to not break existing usages and
uses `zxid` as the sequence number to not change on-disk wire format.

The consequence is that:
* Not consecutive. There could be hole in two adjacent sequential number.
* Fail to create through `multi` with same prefix path. But `multi` does
  not work well with same path modifications anyway. See ZOOKEEPER-4695.
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC zxid only use the least significant 32 bit as sequential id, so it's still overflow with Integer.MAX_VALUE bound.

Why do we need this manner since it doesn't release the upper bound.

@kezhuw
Copy link
Member Author

kezhuw commented Sep 23, 2023

IIRC zxid only use the least significant 32 bit as sequential id, so it's still overflow with Integer.MAX_VALUE bound.

No, overflow of the lower 32-bits means re-election.

/**
* Address the rollover issue. All lower 32bits set indicate a new leader
* election. Force a re-election instead. See ZOOKEEPER-1277
*/
if ((request.zxid & 0xffffffffL) == 0xffffffffL) {
String msg = "zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
shutdown(msg);
throw new XidRolloverException(msg);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants