Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-75: Perform serialization/deserialization with LightProto #9046

Merged
merged 20 commits into from
Jan 6, 2021

Conversation

merlimat
Copy link
Contributor

Motivation

As explained in https://github.com/apache/pulsar/wiki/PIP-75%3A-Replace-protobuf-code-generator , replace the patched Google Protobuf serialization with LightProto.

  1. Removed all generated Java files (sources are now generated by maven plugin at build time)
  2. Removed everything related to the patched protobuf

@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Dec 24, 2020
@merlimat merlimat added this to the 2.8.0 milestone Dec 24, 2020
@merlimat merlimat self-assigned this Dec 24, 2020
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

I find it very nice that we are creating less builders and in general the overall simplification of the codebase.

Thank you for providing this improvement.

Most of the CI jobs failed, please take a look

Copy link
Contributor

@rdhabalia rdhabalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work. I think such changes should be merged after extensive testing. Is it already running in any of your test env?

Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great contribution! Thanks for working on this! But before merging this pull request, can you share some benchmark results between before this change and after this change? Because it changes the entire serialization and deserialization framework.

@merlimat
Copy link
Contributor Author

nice work. I think such changes should be merged after extensive testing. Is it already running in any of your test env?

@rdhabalia Not yet, this is a big change against a target that is moving very fast (master). There is no way this can go to production before getting merged to master first, though it will surely get extensively tested before the 2.8 release.

Also, we need to differentiate the 2 aspect:

  1. Code generator has already extensive set of tests to ensure generated code behaves in the same exact way as protobuf
  2. The integration into the Pulsar repo is dependent on merge conflict, even testing a version today, a new change in master will be likely to introduce bugs in the integration and would need to be re-tested after the merge in any case.

can you share some benchmark results between before this change and after this change?

@sijie Good point, the micro-benchmark in https://github.com/splunk/lightproto is based on the default Protobuf 3.13.

I've added a new benchmark targeting Pulsar specific serizalization at https://github.com/merlimat/LightProtoPulsarBenchmark

Benchmark                                                   Mode  Cnt   Score    Error   Units
SimpleBenchmark.deserialize_metadata_lightproto            thrpt    3  18.891 ±  1.675  ops/us
SimpleBenchmark.deserialize_metadata_protobuf_241_patched  thrpt    3   6.187 ±  3.399  ops/us
SimpleBenchmark.deserialize_send_lightproto                thrpt    3  23.830 ±  8.318  ops/us
SimpleBenchmark.deserialize_send_protobuf_241_patched      thrpt    3   7.926 ± 14.269  ops/us
SimpleBenchmark.serialize_metadata_lightproto              thrpt    3   6.699 ± 12.016  ops/us
SimpleBenchmark.serialize_metadata_protobuf_241_patched    thrpt    3   2.943 ±  0.662  ops/us
SimpleBenchmark.serialize_send_lightproto                  thrpt    3  22.905 ± 18.748  ops/us
SimpleBenchmark.serialize_send_protobuf_241_patched        thrpt    3   4.614 ±  1.750  ops/us

@merlimat
Copy link
Contributor Author

merlimat commented Jan 4, 2021

@sijie PTAL since every days there are new conflicts to fix

@sijie
Copy link
Member

sijie commented Jan 4, 2021

@merlimat sorry. I am reviewing it today.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@merlimat merlimat merged commit c12765a into apache:master Jan 6, 2021
@merlimat merlimat deleted the lightproto branch January 6, 2021 00:42
zymap added a commit to zymap/pulsar that referenced this pull request Jan 13, 2021
---

*Motivation*

We introduce a new way to handle the proto and remove the
'protobuf-shaded/pom.xml' in the PR apache#9046.
We need to remove the set version in the scripts.
sijie pushed a commit that referenced this pull request Jan 14, 2021
---

*Motivation*

We introduce a new way to handle the proto and remove the
'protobuf-shaded/pom.xml' in the PR #9046.
We need to remove the set version in the scripts.
sijie pushed a commit that referenced this pull request Apr 9, 2021
Fixes: #10097 

### Motivation

See #10097 for the issue. It seems that the code broke when the switch was made to LightProto in #9046.

### Modifications

It is necessary to use `msg.getMessageBuilder().hasReplicatedFrom()` and use logic that only calls `msg.getMessageBuilder().getReplicatedFrom()` if `hasReplicatedFrom()` returns true.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants