Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][client] Set fields earlier for correct ClientCnx initialization #19327

Merged
merged 3 commits into from
Jan 30, 2023

Conversation

michaeljmarshall
Copy link
Member

Fixes #13923

Motivation

When the Java client's ClientCnx is initialized, there is a race to set the target broker and the remote hostname. The target brorker is relevant when connecting through the proxy and the remote hostname is relevant for certain kinds of authentication, like SASL. In most cases, this race results in the preferred order where these values are set before the ClientCnx#channelActive method is called. However, when that method is called before they are set, problems occur.

This PR ensures that the fields are set before we call channel.connect(...).

Additional motivation for understanding the correctness of this solution can be found in the relevant netty code:

https://github.com/netty/netty/blob/cbd324c178135a82f23749bc218c2c6ee3a9b140/transport-classes-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollChannel.java#L648-L659

In that block, notice that the isActive gets set to true, the promise gets completed (which likely triggers callbacks), and then the channel active event gets triggered. Interestingly, the easiest way to reproduce the problematic behavior in the proxy is with the following steps:

  1. Update createConnection(InetSocketAddress logicalAddress, InetSocketAddress physicalAddress, int connectionKey) so that the callback for createConnection is thenAcceptAsync instead of thenAccept.
  2. Run ProxyTest#testProducer().
  3. Observe failure in logs.

Modifications

  • Update ConnectionPool to pass the logical broker address through to the initialization methods.
  • Update logic to compare the logical and physical addresses because one address is resolved.

Verifying this change

It is challenging to create a pure reproducer for this case. I was easily able to reproduce it by making the callback complete in another thread. Here is the thenAccept that I changed to thenAcceptAsync:

createConnection(logicalAddress, physicalAddress).thenAccept(channel -> {

At the very least, the current test coverage will ensure this does not introduce a regression.

Does this pull request potentially affect one of the following parts:

This is not a breaking change.

Documentation

  • doc-not-needed

Matching PR in forked repository

PR in forked repository: michaeljmarshall#20

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Awesome job on the investigation and the fix!

@lhotari
Copy link
Member

lhotari commented Jan 26, 2023

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

Codecov Report

Merging #19327 (684a135) into master (6c014ee) will increase coverage by 31.78%.
The diff coverage is 66.66%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #19327       +/-   ##
=============================================
+ Coverage     32.31%   64.10%   +31.78%     
- Complexity     6350    25912    +19562     
=============================================
  Files          1644     1818      +174     
  Lines        123711   133090     +9379     
  Branches      13487    14640     +1153     
=============================================
+ Hits          39983    85311    +45328     
+ Misses        77828    39932    -37896     
- Partials       5900     7847     +1947     
Flag Coverage Δ
inttests 24.99% <51.51%> (+0.03%) ⬆️
systests 25.56% <51.51%> (+0.02%) ⬆️
unittests 61.32% <66.66%> (+43.77%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...org/apache/pulsar/common/naming/NamespaceName.java 89.04% <ø> (+19.17%) ⬆️
...ava/org/apache/pulsar/common/naming/TopicName.java 95.41% <ø> (+16.51%) ⬆️
...va/org/apache/pulsar/broker/service/ServerCnx.java 53.34% <33.33%> (+15.22%) ⬆️
...e/pulsar/client/impl/PulsarChannelInitializer.java 91.26% <77.77%> (+47.64%) ⬆️
.../org/apache/pulsar/client/impl/ConnectionPool.java 75.12% <80.00%> (+11.53%) ⬆️
.../apache/pulsar/common/naming/SystemTopicNames.java 55.55% <0.00%> (-5.56%) ⬇️
...in/java/org/apache/pulsar/common/api/AuthData.java 71.42% <0.00%> (ø)
...ava/org/apache/pulsar/client/api/schema/Field.java 80.00% <0.00%> (ø)
.../apache/pulsar/broker/namespace/LookupOptions.java 87.50% <0.00%> (ø)
...org/apache/pulsar/common/io/BatchSourceConfig.java 50.00% <0.00%> (ø)
... and 1227 more

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

@BewareMyPower BewareMyPower merged commit 3d8b52a into apache:master Jan 30, 2023
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
@michaeljmarshall michaeljmarshall added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proxy] Race condition in Pulsar Proxy that causes UnsupportedOperationExceptions in Proxy logs
5 participants