Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker]Fix chunked messages will be filtered by duplicating #20948

Merged
merged 25 commits into from
Aug 31, 2023
Merged

[fix][broker]Fix chunked messages will be filtered by duplicating #20948

merged 25 commits into from
Aug 31, 2023

Conversation

liangyepianzhou
Copy link
Contributor

@liangyepianzhou liangyepianzhou commented Aug 7, 2023

Motivation

Make the chunk message function work properly when deduplication is enabled.

Modification

Only check and store the sequence ID of the last chunk in a chunk message.

For example:

    Chunk-1 sequence ID: 0, chunk ID: 0, total chunk: 2
    Chunk-2 sequence ID: 0, chunk ID: 1
    Chunk-3 sequence ID: 1, chunk ID: 0 total chunk: 3
    Chunk-4 sequence ID: 1, chunk ID: 1
    Chunk-5 sequence ID: 1, chunk ID: 1
    Chunk-6 sequence ID: 1, chunk ID: 2

Only store check and store the sequence ID of Chunk-2 and Chunk-6.
Add a property in the publishContext to determine whether this chunk is the last chunk when persistent completely.

publishContext.setProperty(IS_LAST_CHUNK, Boolean.FALSE);

Filter and ack duplicated chunks in a chunk message instead of discarding ctx.

For example:

    Chunk-1 sequence ID: 0, chunk ID: 0, msgID: 1:1
    Chunk-2 sequence ID: 0, chunk ID: 1, msgID: 1:2
    Chunk-3 sequence ID: 0, chunk ID: 2, msgID: 1:3
    Chunk-4 sequence ID: 0, chunk ID: 1, msgID: 1:4
    Chunk-5 sequence ID: 0, chunk ID: 2, msgID: 1:5
    Chunk-6 sequence ID: 0, chunk ID: 3, msgID: 1:6

We should filter and ack chunk-4 and chunk-5.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

### Motivation
Chunked messages use the same metadata, so all the chunked messages in a single message use the same sequence Id. And it will be recorded as duplicated messages.
```
    private long updateMessageMetadataSequenceId(final MessageMetadata msgMetadata) {
        final long sequenceId;
        if (!msgMetadata.hasSequenceId()) {
            sequenceId = msgIdGenerator++;
            msgMetadata.setSequenceId(sequenceId);
        } else {
            sequenceId = msgMetadata.getSequenceId();
        }
        return sequenceId;
    }
```
### Modification
Use different sequence id for chunk message.
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 7, 2023
@heesung-sn
Copy link
Contributor

@rdhabalia Please review this change. I think we have missed discussing the deduplication feature compatibility when introducing the chunking feature in the PIP(https://github.com/apache/pulsar/wiki/PIP-37:-Large-message-size-handling-in-Pulsar).

@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug category/functionality Some functions are not working such as get errors labels Aug 21, 2023
@codelipenghui codelipenghui added this to the 3.2.0 milestone Aug 21, 2023
@heesung-sn
Copy link
Contributor

heesung-sn commented Aug 22, 2023

Does this pr require review? Have you raised a PIP for another solution?

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create another test class like MessageChunkingDeduplicationTest for it for the following reasons:

  1. MessageChunkingSharedTest was added to test chunking with Shared subscriptions. It's bad to add unrelated tests into this class.
  2. You won't need to set conf.setBrokerDeduplicationEnabled(true) and restart the broker for each test method.

@heesung-sn heesung-sn self-requested a review August 30, 2023 23:43
@liangyepianzhou liangyepianzhou merged commit b0b13bc into apache:master Aug 31, 2023
45 checks passed
@liangyepianzhou liangyepianzhou deleted the deplicate_chunk branch August 31, 2023 01:58
liangyepianzhou added a commit that referenced this pull request Sep 4, 2023
…0948)

Make the chunk message function work properly when deduplication is enabled.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, total chunk: 2
     Chunk-2 sequence ID: 0, chunk ID: 1
     Chunk-3 sequence ID: 1, chunk ID: 0 total chunk: 3
     Chunk-4 sequence ID: 1, chunk ID: 1
     Chunk-5 sequence ID: 1, chunk ID: 1
     Chunk-6 sequence ID: 1, chunk ID: 2
```
Only store check and store the sequence ID of Chunk-2 and Chunk-6.
**Add a property in the publishContext to determine whether this chunk is the last chunk when persistent completely.**
```java
publishContext.setProperty(IS_LAST_CHUNK, Boolean.FALSE);
```
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, msgID: 1:1
     Chunk-2 sequence ID: 0, chunk ID: 1, msgID: 1:2
     Chunk-3 sequence ID: 0, chunk ID: 2, msgID: 1:3
     Chunk-4 sequence ID: 0, chunk ID: 1, msgID: 1:4
     Chunk-5 sequence ID: 0, chunk ID: 2, msgID: 1:5
     Chunk-6 sequence ID: 0, chunk ID: 3, msgID: 1:6
```
We should filter and ack chunk-4 and chunk-5.

(cherry picked from commit b0b13bc)
liangyepianzhou added a commit that referenced this pull request Sep 4, 2023
…0948)

Make the chunk message function work properly when deduplication is enabled.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, total chunk: 2
     Chunk-2 sequence ID: 0, chunk ID: 1
     Chunk-3 sequence ID: 1, chunk ID: 0 total chunk: 3
     Chunk-4 sequence ID: 1, chunk ID: 1
     Chunk-5 sequence ID: 1, chunk ID: 1
     Chunk-6 sequence ID: 1, chunk ID: 2
```
Only store check and store the sequence ID of Chunk-2 and Chunk-6.
**Add a property in the publishContext to determine whether this chunk is the last chunk when persistent completely.**
```java
publishContext.setProperty(IS_LAST_CHUNK, Boolean.FALSE);
```
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, msgID: 1:1
     Chunk-2 sequence ID: 0, chunk ID: 1, msgID: 1:2
     Chunk-3 sequence ID: 0, chunk ID: 2, msgID: 1:3
     Chunk-4 sequence ID: 0, chunk ID: 1, msgID: 1:4
     Chunk-5 sequence ID: 0, chunk ID: 2, msgID: 1:5
     Chunk-6 sequence ID: 0, chunk ID: 3, msgID: 1:6
```
We should filter and ack chunk-4 and chunk-5.

(cherry picked from commit b0b13bc)
Technoboy- pushed a commit that referenced this pull request Sep 5, 2023
…0948)

## Motivation
Make the chunk message function work properly when deduplication is enabled.
## Modification
### Only check and store the sequence ID of the last chunk in a chunk message.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, total chunk: 2
     Chunk-2 sequence ID: 0, chunk ID: 1
     Chunk-3 sequence ID: 1, chunk ID: 0 total chunk: 3
     Chunk-4 sequence ID: 1, chunk ID: 1
     Chunk-5 sequence ID: 1, chunk ID: 1
     Chunk-6 sequence ID: 1, chunk ID: 2
```   
Only store check and store the sequence ID of Chunk-2 and Chunk-6.
**Add a property in the publishContext to determine whether this chunk is the last chunk when persistent completely.**
```java
publishContext.setProperty(IS_LAST_CHUNK, Boolean.FALSE);
```
### Filter and ack duplicated chunks in a chunk message instead of discarding ctx.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, msgID: 1:1
     Chunk-2 sequence ID: 0, chunk ID: 1, msgID: 1:2
     Chunk-3 sequence ID: 0, chunk ID: 2, msgID: 1:3
     Chunk-4 sequence ID: 0, chunk ID: 1, msgID: 1:4
     Chunk-5 sequence ID: 0, chunk ID: 2, msgID: 1:5
     Chunk-6 sequence ID: 0, chunk ID: 3, msgID: 1:6
```   
We should filter and ack chunk-4 and chunk-5.
Technoboy- pushed a commit that referenced this pull request Sep 5, 2023
…0948)

## Motivation
Make the chunk message function work properly when deduplication is enabled.
## Modification
### Only check and store the sequence ID of the last chunk in a chunk message.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, total chunk: 2
     Chunk-2 sequence ID: 0, chunk ID: 1
     Chunk-3 sequence ID: 1, chunk ID: 0 total chunk: 3
     Chunk-4 sequence ID: 1, chunk ID: 1
     Chunk-5 sequence ID: 1, chunk ID: 1
     Chunk-6 sequence ID: 1, chunk ID: 2
```   
Only store check and store the sequence ID of Chunk-2 and Chunk-6.
**Add a property in the publishContext to determine whether this chunk is the last chunk when persistent completely.**
```java
publishContext.setProperty(IS_LAST_CHUNK, Boolean.FALSE);
```
### Filter and ack duplicated chunks in a chunk message instead of discarding ctx.
 For example:
 ```markdown
     Chunk-1 sequence ID: 0, chunk ID: 0, msgID: 1:1
     Chunk-2 sequence ID: 0, chunk ID: 1, msgID: 1:2
     Chunk-3 sequence ID: 0, chunk ID: 2, msgID: 1:3
     Chunk-4 sequence ID: 0, chunk ID: 1, msgID: 1:4
     Chunk-5 sequence ID: 0, chunk ID: 2, msgID: 1:5
     Chunk-6 sequence ID: 0, chunk ID: 3, msgID: 1:6
```   
We should filter and ack chunk-4 and chunk-5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants