Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][client] Fix cannot retry chunk messages and send to DLQ #21048

Merged
merged 4 commits into from
Aug 30, 2023

Conversation

RobertIndie
Copy link
Member

@RobertIndie RobertIndie commented Aug 22, 2023

Fixes #20934

Motivation

If the consumer receives a chunked message, it currently couldn't retry it or send it to the DLQ. It will throw an exception like this:

java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$InvalidMessageException: The producer xxx of the topic persistent://a/b/c sends a message with 8085120 bytes that exceeds 5242880 bytes

This is because we don't enable the chunking for the retry producer and the DLQ producer in the consumer.

This PR enables the chunking feature for retry and DLQ producers. However, we cannot enable both chunking and batching simultaneously. Therefore, the trade-off of this change is that we have to turn off the batching for the DLQ producer, while the retry producer has already done so. This is acceptable because the DLQ producer usually has low traffic. It does not have a significant impact on the performance.

This PR also addresses the following two issues:

Issue A: Consumers cannot redeliver chunked messages

When the consumer attempts to redeliver a chunked message, the message ID is reconstructed in discardBatch, which causes this message ID to lose the chunk information. The comparison in the unAckedChunkedMessageIdSequenceMap will fail. The consumer will never be able to redeliver chunked messages.

We should not reconstruct the chunked message ID in discardBatch. The chunk message ID does not contain any batch information.

Issue B: The chunked message cannot be resent

When the producer resends the chunked message, the chunk id in the message metadata of all subsequent chunks will be set to the last chunk id.

The root cause is that the producer shares the same MessageMetadata for all chunks in a chunked message.

final MessageMetadata finalMsgMetadata = msgMetadata;
op.rePopulate = () -> {
op.cmd = sendMessage(producerId, sequenceId, numMessages, messageId, finalMsgMetadata,
encryptedPayload);
};

Therefore, when op.rePopulate is called, the chunk id in finalMsgMetadata is pointed to the last chunk. Suppose we have a large message with 3 chunks. The producer may end up sending all 3 messages with the same chunk id 3. The consumer will treat these messages as corrupted and ignore them, preventing it from receiving any of these messages.

Modification

  • Enable chunking and disable batching for retry and DLQ producers.
  • Don't discard the batch for the chunked message ID to fix Issue A.
  • Don't redeliver the chunk message ID itself when redelivering unacked messages. Just need to redeliver the message ID for all single chunks.
  • Reset the chunkID when resending the chunk message to fix Issue B.
  • Add chunk messages test case for testDeadLetterTopic.

Verifying this change

This change added tests.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@RobertIndie RobertIndie self-assigned this Aug 22, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 22, 2023
@RobertIndie RobertIndie added this to the 3.2.0 milestone Aug 22, 2023
@RobertIndie
Copy link
Member Author

Hi, @poorbarcode , I have added the timeout of receive opeartion for the test. PTAL again. Thanks.

@codelipenghui codelipenghui merged commit 99e3fea into apache:master Aug 30, 2023
45 checks passed
@RobertIndie RobertIndie deleted the dlq-chunk branch August 31, 2023 09:57
RobertIndie added a commit that referenced this pull request Aug 31, 2023
RobertIndie added a commit that referenced this pull request Aug 31, 2023
RobertIndie added a commit to RobertIndie/pulsar that referenced this pull request Sep 5, 2023
RobertIndie added a commit to RobertIndie/pulsar that referenced this pull request Sep 5, 2023
RobertIndie added a commit that referenced this pull request Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Dead Letter Policy in Consumer configuration doesn't account for chunking or batching configuration.
4 participants