Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store metadata in ZK with binary protobuf format #281

Closed
merlimat opened this issue Mar 7, 2017 · 3 comments
Closed

Store metadata in ZK with binary protobuf format #281

merlimat opened this issue Mar 7, 2017 · 3 comments
Assignees
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Milestone

Comments

@merlimat
Copy link
Contributor

merlimat commented Mar 7, 2017

In Pulsar we are storing a lot of metadata in ZooKeeper using different formats:

  • BookKeeper ledgers: Protobuf Text
  • Managed Ledgers and cursors: Protobuf Text
  • Broker and namespace bundles load reports: JSON

Using text formats has been good for quick debugging sessions without special tools but has drawbacks:

  • Size of data stored in ZK can be significant when many topics (>1M) are active in a cluster. Protobuf text format is like json and needs to repeat all the field names each time.
  • Speed of serializing/deserializing (binary formats are always faster to parse)
  • Garbage generated (with binary format we could switch to the custom protobuf code generator to generate reusable objects)
  • Backward compatibility. Text protobuf is not backward compatible (unlike the binary parser), it will fail to parse unknown fields (and there's no way to change that). This makes very difficult to change the format (typically we would do 1 release that can understand the new format but still writes the old one, then next release to write new format). Backward compatibility is key to ensure we can rollback a release if some issue is detected during deployment.

Of the 3 categories listed above, I don't think we should bother about load reports, because they're not where the bulk of metadata is.

My proposal would be:

  • 1.17 release:

    1. Add the code to read both formats
    2. A config switch to enable writing binary format for ML and cursors data in ZK, with default to text format.
    3. Add tools to dump the content of a ML for human consumption
  • 1.18 release:

    1. Make binary default
    2. Remove config switch for text/binary

Once the change has been implemented it would be easy to pre-verify the size difference and eventually think of storing even BK ledgers in binary format.

cc: @saandrews @rdhabalia @msb-at-yahoo @sschepens

@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Mar 7, 2017
@merlimat merlimat added this to the 1.17 milestone Mar 7, 2017
@merlimat merlimat self-assigned this Mar 7, 2017
@msb-at-yahoo
Copy link
Contributor

👍 We'll need a CLI that knows how to decode into protobuf text so that we can still examine the system: maybe a wrapper around zoosh.

How do you propose to rewrite the data in ZK into the new format? We obviously won't want to do it too quickly.

@merlimat
Copy link
Contributor Author

merlimat commented Mar 7, 2017

For the re-writing I was thinking to just do that each of them individually, at the first write occourence.

For z-nodes that are not rewritten, we can leave the code that fallback to deserislize the text format indefinitely.

@merlimat
Copy link
Contributor Author

merlimat commented Mar 8, 2017

Added PR with first part of needed changes. About the CLI tool, I'm leaning towards having a REST API that read from ZK and returns the json. The topic won't need to be loaded, just any broker could answer that request.

hangc0276 pushed a commit to hangc0276/pulsar that referenced this issue May 26, 2021
This PR bumps project version to 2.8.0-SNAPSHOT. Also it bumps pulsar version to 2.8.0-rc-202012200040 so that we can make use of some latest features.

* Bump project version to 2.8.0-SNAPSHOT

* Bump pulsar to 2.8.0-rc-202012200040
dlg99 pushed a commit to dlg99/pulsar that referenced this issue May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

2 participants