Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [client] fix huge permits if acked a half batched message #22091

Merged
merged 4 commits into from
Feb 26, 2024

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Feb 21, 2024

Motivation

Context:

  • The broker decreases the permits after delivering messages to the client.
    • It only decreases the un-acked part for permits when delivering batched messages to the client.
  • The client acquires more permits after consuming messages.
    • (Highlight) It acquires the whole batch for permits even if it only received half of the batched message.

Reproduce step:

  • The permits were initialized as 1000
  • Send a batch message containing 100 single messages.
  • Ack a half of batched messages(2 single messages).
  • Redeliver the batch message.
  • Receive the 98 messages left on the topic.
  • (Highlight) permits is 1002.

One our ENV, the permits went to 100,000+(it was initialized as 50), and leading the set pendingAcks went to a huge value. This may causes a client side OOM.

Screenshot 2024-02-21 at 22 55 42 Screenshot 2024-02-21 at 23 39 11

Modifications

  • The client acquires more permits after consuming messages.
    • It acquires the half batch for permits if it only received half of the batched message.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost release/3.0.3 release/2.11.4 release/3.1.3 release/3.2.1 labels Feb 21, 2024
@poorbarcode poorbarcode added this to the 3.3.0 milestone Feb 21, 2024
@poorbarcode poorbarcode self-assigned this Feb 21, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 21, 2024
@Technoboy- Technoboy- closed this Feb 21, 2024
@Technoboy- Technoboy- reopened this Feb 21, 2024
@Technoboy-
Copy link
Contributor

Some tests are failed due to this fix

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Actually, we have another option to fix this issue.

Now, the broker side uses the ackedCount - totalMessages to decrease the permits.

MESSAGE_PERMITS_UPDATER.addAndGet(this, ackedCount - totalMessages);

If we change to - totalMessages, we don't need to catch up the fix for other clients. But it's not good for long term because it will lead to more flow permits requests.

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (ackBitSet != null && !ackBitSet.get(index)) {

You can also change here to use the newly added method isSingleMessageAcked

@poorbarcode
Copy link
Contributor Author

You can also change here to use the newly added method isSingleMessageAcked

Sory, I forgot that place, improved

@poorbarcode poorbarcode merged commit 0c49cac into apache:master Feb 26, 2024
50 checks passed
heesung-sn pushed a commit that referenced this pull request Feb 27, 2024
poorbarcode added a commit that referenced this pull request Feb 28, 2024
(cherry picked from commit 0c49cac)
(cherry picked from commit 071e26d)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 1, 2024
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 cherry-picked/branch-3.2 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.11.4 release/3.0.3 release/3.1.3 release/3.2.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants