PIP-22: Dead Letter Topic #2400

codelipenghui · 2018-08-20T06:56:32Z

Motivation

Fixes #189

When consumer got messages from pulsar, It's difficult to ensure every message can be consume success. Pulsar support message redelivery feature by set acknowledge timeout when create a new consumer. This is a good feature guarantee consumer will not lost messages.

But however, some message will redelivery so many times possible, even to the extent that it can be never stop.

So, It's necessary to support a feature to control it by pulsar. Users can use this feature and customize this feature to control the message redelivery behavior. The feature named Dead Letter Topic.

Modifications

Consumer can set maximum number of redeliveries by java client.
Consumer can set the name of Dead Letter Topic by java client, It’s not necessary.
Message exceeding the maximum number of redeliveries should send to Dead Letter Topic and acknowledged automatic.

Result

If consumer enable future of dead letter topic. When Message exceeding the maximum number of redeliveries, message will send to the Dead Letter Topic and acknowledged automatic.

…lsarApi.proto

…MultipleConsumers

NullPointException to setDeadLetterTopic() in PulsarApi.java

…lsarApi.proto

…MultipleConsumers

NullPointException to setDeadLetterTopic() in PulsarApi.java

# Conflicts: # pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java # pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java

…lsarApi.proto

…MultipleConsumers

NullPointException to setDeadLetterTopic() in PulsarApi.java

jiazhai · 2018-08-22T05:48:49Z

pulsar-broker/src/main/java/org/apache/pulsar/utils/Quorum.java

+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+
+public class Quorum {


It maybe better to add some comments, such as how to use it, and the meaning of each public methods.

jiazhai · 2018-08-22T05:52:36Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/SimpleProducerConsumerTest.java

+                .create();
+
+        for (int i = 0; i < sendMessages; i++) {
+            producer.send("Hello Pulsar!".getBytes());


How about combine "i" together with the message content? it may be useful a little to debug.

jiazhai · 2018-08-22T05:58:48Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/SimpleProducerConsumerTest.java

+            consumer.acknowledge(message);
+            totalInDeadLetter++;
+        } while (totalInDeadLetter < sendMessages);
+


Could we assert some thing to do a verification? such as messages are all received and acked?

jiazhai

overall lgtm. add some nit comments.

sijie · 2018-08-22T07:19:44Z

overall looks good to me.

@merlimat can you take a look at this?

sijie · 2018-08-22T16:24:26Z

run java8 tests

rdhabalia · 2018-08-24T00:57:20Z

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

        this.cursor = cursor;
        this.name = topic.getName() + " / " + Codec.decode(cursor.getName());
        this.topic = topic;
        this.messagesToReplay = new ConcurrentLongPairSet(512, 2);
+        this.messagesToDeadLetter = new HashSet<>(8);


umm.. this can cause in high gc for broker. Essentially we would like to avoid storing objects in heap with relatively some what long life-cycle. therefore, in past release we had an effort to clean up storing PositionImp objects from code base. so, for broker serving high throughput with low latency requirement might want to having this feature.

+1, @codelipenghui, we may need following the way of messagesToReplay, which store ledgerId and entryId as primitive value, and then compose a Position from(ledgerId + entryId) when using it.

rdhabalia · 2018-08-24T00:58:04Z

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

+        if (maxRedeliveryCount > 0 && deadLetterTopicProducer == null) {
+            try {
+                if (maxRedeliveryCount > 0 && StringUtils.isBlank(deadLetterTopic)) {
+                    deadLetterTopic = String.format("%s-%s-DLQ", topic.getName(), Codec.decode(cursor.getName()));


what if tenant already has topic with postfix DLQ?

Any better suggest?

Seems it is up to application to decide. In this change, application can choose its own name, and only use “-DLQ” when the application doesn’t specify.

…alue, and then compose a Position from(ledgerId + entryId) when using it.

merlimat

In general, I would prefer to handle most of logic for the DLQ in
the client library rather than the broker, just leaving the
minimum support required on server side.

This, in my view has several advantages:

No need to worry about use of resources in broker (which
could be a problem when many topics would want to use DLQ
with, potentially, arbitrary settings.
Clear isolation from different consumers

Another point that I think should be tied to the DLQ is the
concept of "negative acks". Currently we only have positive acks
for messages. Application might decide not to ack a message (and
possibly set a timeout for redelivery), but there's no way to
immediately tell the broker that a message cannot be processed.

I think negative ack in consumer interface would make it more
clear, from a broker perspective that the "processing" of the
message has indeed failed, not just the delivery to consumer.

The case of ack timeout, could then be folded as a type of
negative ack.

My suggestion would be :

Have the broker to keep track of negative acks received by
consumers for a particular message id. This would involve to
augment the pendingAcks map (which has a "free" long spot in
value field) to also have the count.
Broker will include the "deliveryCount" as part of
CommandMessage when pushing messages to a consumer
The count of redeliveries in broker is kept as
a "best-effort". If a broker crashes, some message might end up
getting redeliveried few more times than configured. I think
this is a reasonable tradeoff for this functionality.
Consumer object in client library will have the DLQ configuration
When a consumer receives a message whose "deliveryCount" exceeds the
consumer's own max, the client library will re-publish it on the DLQ.

Finally, one more item to consider is that, by default, if there
is no subscription on the DLQ topic, the messages will be dropped
immediately.

For messages to be retained, we would either:

Make sure at least a subscription exist in the DLQ topic
Force to use a default subscription name (eg: DLQ)
Allow to configure the subscription name as part of the DLQ
policies
Enforce to have data retention on the DLQ topic. This might
be complicated since the retention is generally applied at
the namespace level.

merlimat · 2018-08-24T05:22:54Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedCursor.java

+    /**
+     * Return entry at the position.
+     */
+    Entry readEntry(PositionImpl position) throws InterruptedException, ExecutionException;


This can already be done through replayEntries() and asyncReplayEntries() methods

When i use asyncReplayEntries(), i should change ConcurrentLongPairSet to Set then call asyncReplayEntries() and in asyncReplayEntries() implement call ledger.asyncReadEntry() foreach. So i think i need a readEntry() method to read entry that no need to change ConcurrentLongPairSet to Set.

merlimat · 2018-08-24T05:25:54Z

pulsar-client/src/main/java/org/apache/pulsar/client/api/ConsumerBuilder.java

+     * Set max un-acked messages per consumer.
+     * This config should less than broker config, if not, config will not enable. 0 is not limit, default is 0
+     */
+    ConsumerBuilder<T> maxUnackedMessagesPerConsumer(int maxUnackedMessagesPerConsumer);


From API perspective it might be good to encapsulate all the DLQ related settings into a single DeadLetterQueuePolicy

merlimat · 2018-08-24T05:28:28Z

pulsar-common/src/main/proto/PulsarApi.proto

+	optional int32 maxRedeliveryCount = 14;
+	// Name of Dead Letter Topic.
+	// If not set, pulsar broker will generate with topic name and subscription name and suffix with -DLQ
+	optional string deadLetterTopic = 15;


I think it's a bit dangerous to leave the option to have a large number of possible topics for which we need to create producer objects inside the broker.

sijie · 2018-08-24T07:02:50Z

@merlimat :

I would prefer to handle most of logic for the DLQ in
the client library rather than the broker, just leaving the
minimum support required on server side.

I think this PR has already made things pretty simple. it keeps redelivery count in memory and doesn't persistent. the major difference I can see between the approach here and your approach is "who" publishes messages to DLQ: the approach here in favor of broker doing publishes, your approach prefer client doing publishes to DLQ.

I am not very convinced on doing DLQ logic in client, for multiple reasons:

a) doing DLQ in client actually make things a bit more complicated. because the redelivery count is tracked at broker, but the publishing happens at the client. Things like configurations/policies are separated in different places, which can result in inconsistent behaviors, which make debugging much harder when issue occurs.
b) doing DLQ in client means the same logic has to be implemented in different language client. The implementation can quickly become inconsistent between different language client.
c) regarding the resource concerns, I don't think the resource usage is different if it is still broker tracking the redelivery count. unless there is something I am missing.
d) doing DLQ in broker allows us doing optimization in future, like persisting delivery count to cursors or a retry queue, without changing client behaviors. especially if we want to support consistent behavior across different language clients.

regarding "negative acks"

"negative acks" is a good thing to have. However I don't think DLQ has to depend on "negative acks". because the current "negative acks" pulsar has is ack timeout. for DLQ, it doesn't really matter it is a ack timeout or explicit ack timeout. We don't really need to tie things together. If DLQ goes first, we still can introduce "negative acks" later, and it will not affect DLQ logic.

From this spective, it is making more sense to do DLQ in broker side. because when DLQ happens at broker side, when we introduce explicitly "negative acks" later, there will be very minimal code changes at both client and broker sides.

regarding message retention in DLQ topic

I think the simplest thing is just let user configure message retention in DLQ topic. That is what typically cloud providers offer. Not sure we need to get into the business of managing subscriptions for DLQ.

merlimat · 2018-08-25T00:57:01Z

@codelipenghui @sijie

I'm still convinced it's much easier to approach this problem in the
client side 😄

Other than the scalability concerns and the complexity in changing the
dispatcher code, I think we should also consider:

If the broker is re-publishing, we would need to have some complex
way to do authorization. Consumer has its own credentials, but
broker will have to create a producer using the broker credentials,
after having verified that consumer also has permission to publish
on the DLQ. Worse, if we revoke the permission on the DLQ topic, we
would also have to figure that out in broker. Moving the publishing
to consumer, will make that problem go away: consumer will use the
same credentials when re-publishing and it will work same as a
regular producer.
If the consumer has a schema, it might be desirable to keep the
schema when republishing. Right now, the producer app needs to have
the Java pojo to create a schema enabled producer, but broker won't
have the Pojo definition.
In the current implementation, it's not handling the fact that
different consumers on the same subscriptions could have different
DLQ settings. Eg: consumer-a: 10 retries and publish on dlq-a
and consumer-a 5 retries with republish on dlq-b. If done on
broker side, broker would have to somehow deal with these
inconsistencies. If the implementation is done on client library,
these inconsistencies will be fine: the broker just needs to keep
(and communicate) the current count of re-deliveries. Each consumer
will then take action based on its local configuration.

a) doing DLQ in client actually make things a bit more
complicated. because the redelivery count is tracked at
broker, but the publishing happens at the client. Things like
configurations/policies are separated in different places,
which can result in inconsistent behaviors, which make
debugging much harder when issue occurs.

The current PR is anyway configuring the DLQ in client API. If we
were to configure the DLQ as a server side policy we would probably
need to have fine-grained policies per-topic (while currently these
are only at the namespace level, except for authorization which is
both at namespace and topic level). For DLQ it would even be at
the subscription level.

b) doing DLQ in client means the same logic has to be implemented in
different language client. The implementation can quickly become
inconsistent between different language client.

This I agree. It would require to do implementation in both Java and C++
client (plus the wrappers). My hope is that the client implementation
would be not be very complicated.

c) regarding the resource concerns, I don't think the resource usage
is different if it is still broker tracking the redelivery
count. unless there is something I am missing.

Well, the difference here would be that broker just need to keep only
a counter, inside a map there already exist.

If the broker is doing the publishes, then it needs to create and cache
producers for all the DLQ topics. If each consumer has its own producers,
this could greatly increase the memory consumption in broker and we would
have to carefully tune the producer queues to make sure we don't get out
of direct memory in the worst case scenarios.

d) doing DLQ in broker allows us doing optimization in future, like
persisting delivery count to cursors or a retry queue, without
changing client behaviors. especially if we want to support
consistent behavior across different language clients.

Optmizations like storing the counter would possible in either case.

"negative acks" is a good thing to have. However I don't think DLQ
has to depend on "negative acks". because the current "negative
acks" pulsar has is ack timeout. for DLQ, it doesn't really matter
it is a ack timeout or explicit ack timeout. We don't really need to
tie things together. If DLQ goes first, we still can introduce
"negative acks" later, and it will not affect DLQ logic.

Sure, we can defer "negative acks" changes and just focus on current
re-delivery functionality.

I think the simplest thing is just let user configure message
retention in DLQ topic. That is what typically cloud providers
offer. Not sure we need to get into the business of managing
subscriptions for DLQ.

Again, the problem lies with policies which are currently namespace
wide.

I believe the best compromise solution we can adopt initially is to
leave the freedom to the user. The documenation will clearly specify
that you need to make sure the data is retained in the DLQ and present
few alternatives, eg:

pulsar-admin topics create-subscribtion my-topic-dlq --subscription dlq

With this command, one can make sure that every message pushed to the
my-topic-dql will be retained (indefinitely) and it will be
accessible through the subscription named dlq.

rdhabalia · 2018-08-25T02:52:47Z

I agree with @merlimat , I can see two major drawbacks with server-side implementation, which may prevent us to enable this feature with high traffic pulsar system:
a. memory footprint required by producer queues
b. authroization handling because publish will happen on-behalf of consumer by broker.

DLQ mainly requires count of message redelivery and then publish max-redelivered message to the DLQ. Same message can be re-delivered to different consumers so, client can't track of redelivery-count of a specific message. so, client can rely on broker to get that count and publish that message to DLQ and ack the message on original topic. I think client-side implementation can make it simpler to use this feature.

sijie · 2018-08-25T03:22:34Z

frankly speaking, I am not sure about the producer concerns: producer memory, authorization, and schema. if you model DLQ topic as replicating messages whose delivery count is larger than max-delivery-count, it should be no difference than cross-data-center replication.

but anyway, since there are two strong opinions on doing this at client-side. I am fine with doing publish at client side. just keep one thing in mind, this feature might end up never happening at some of the languages (for example, websocket) unless there are efforts spent on this.

Let me summarize the discussion in this PR to make sure everyone is on same page about the scope of this PR. The suggestion for the implementation will be:

keep the delivery count tracking still at broker
move the publish to DLQ logic to consumer side

for other comments from Matteo will not be included in PR. but will be done via future improvements:

negative acks would be an independent feature in future. this PR doesn't have to deal with that.
message retention will not be enforced in this PR. Applications will be responsible for setting correct message retention. Documentation will be added to make things clear. We can enforce DLQ topic rentention when we allow overriding policies at topic/subscription level.

@merlimat @rdhabalia can you confirm the summary is correct?

@codelipenghui does this approach work for your application?

codelipenghui · 2018-08-25T05:15:40Z

@sijie Yes, it's approach work for me. We should send a request to broker for get redelivery count per message when consumer send redeliverUnackMessages request before?

sijie · 2018-08-25T05:32:29Z

@merlimat @rdhabalia :

We should send a request to broker for get redelivery count per message when consumer send redeliverUnackMessages request before?

can you answer @codelipenghui 's question?

merlimat · 2018-08-25T15:35:16Z

@merlimat @rdhabalia can you confirm the summary is correct?

👍

frankly speaking, I am not sure about the producer concerns: producer memory, authorization, and schema. if you model DLQ topic as replicating messages whose delivery count is larger than max-delivery-count, it should be no difference than cross-data-center replication.

I'm not saying it's not possible to make it efficient and safe, but that it would require more work to cover all conditions

@sijie Yes, it's approach work for me. We should send a request to broker for get redelivery count per message when consumer send redeliverUnackMessages request before?

The dispatcher could include the redelivery count when pushing messages to the consumer (in CommandMessage).

codelipenghui · 2018-08-26T00:36:04Z

I have a idea for get the delivery count from broker. When consumer send a redeliveryUnackMessages request then broker process the request, broker can return the messageIds(if consumer can contains the message payload) which should send to the DLQ.

For this solution, broker must keep the max redelivery count per subscription. A message possible send to different consumer by dispatcher, if keep max redelivery count in consumer, it's difficult to confirm how many times for redelivery. I think that the max redelivery count should use the last consumer config in a subscription.

I think this solution is smaller spending.

codelipenghui · 2018-09-04T05:52:44Z

This PR can close now. I have a new PR-2508 for PIP-22 Dead Letter Topic.

sijie · 2018-09-04T07:16:13Z

Closed this PR and use #2508 for it.

codelipenghui added 30 commits August 13, 2018 18:25

Add redeliveryCount and deadLetterTopic to CommandSubscribe in Pu…

be74b4e

…lsarApi.proto

Add maxRedeliveryCount and deadLetterTopic to java client

3cc25d1

Add RedeliveryTracker

44cf08e

Implement of Dead Letter Topic

d4ff25c

Fix import

2f4957a

Update maxRedeliveryCount and deadLetterTopic in PersistentDispatcher…

8867373

…MultipleConsumers

Fix bug:

195bf0b

NullPointException to setDeadLetterTopic() in PulsarApi.java

Add redeliveryCount and deadLetterTopic to CommandSubscribe in Pu…

5ae3c46

…lsarApi.proto

Add maxRedeliveryCount and deadLetterTopic to java client

1529fcb

Add RedeliveryTracker

ec6da32

Implement of Dead Letter Topic

8e6c96a

Fix import

6bb9281

Update maxRedeliveryCount and deadLetterTopic in PersistentDispatcher…

ccf8abb

…MultipleConsumers

Fix bug:

eeae4ea

NullPointException to setDeadLetterTopic() in PulsarApi.java

Fix bug of generate default dead letter topic. Add dead letter tutorial.

22f5e58

Merge remote-tracking branch 'origin/DLQ' into DLQ

49b7cac

# Conflicts: # pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java # pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java

Fix bug of deserialize entry

74e1e8d

Fix send dead letter topic with exception message can't re-use

5716023

Fix bug in tutorial of SubscriptionWithDeadLetter

05aaad2

Add read entry by position in ManagedCursor

9fe5541

Add redeliveryCount and deadLetterTopic to CommandSubscribe in Pu…

c2503c4

…lsarApi.proto

Add maxRedeliveryCount and deadLetterTopic to java client

6bd204c

Add RedeliveryTracker

f4a77f8

Implement of Dead Letter Topic

30d2bf1

Fix import

f5ea1c4

Update maxRedeliveryCount and deadLetterTopic in PersistentDispatcher…

1107481

…MultipleConsumers

Fix bug:

9e895f8

NullPointException to setDeadLetterTopic() in PulsarApi.java

Fix bug of generate default dead letter topic. Add dead letter tutorial.

4c41659

Fix bug of deserialize entry

3ab4834

Fix send dead letter topic with exception message can't re-use

b5bee1f

jiazhai reviewed Aug 22, 2018

View reviewed changes

Add comments to Quorum.

b1b1378

Add some assert and logs.

4b2c3a9

jiazhai approved these changes Aug 24, 2018

View reviewed changes

rdhabalia reviewed Aug 24, 2018

View reviewed changes

Change messagesToDeadLetter store ledgerId and entryId as primitive v…

2f1028e

…alue, and then compose a Position from(ledgerId + entryId) when using it.

merlimat reviewed Aug 24, 2018

View reviewed changes

Delete unused import.

2c46fc3

sijie closed this Sep 4, 2018

sijie removed this from the 2.2.0-incubating milestone Sep 4, 2018

merlimat modified the milestone: 2.2.0-incubating Sep 16, 2018

codelipenghui deleted the DLQ branch September 20, 2018 09:11

sijie mentioned this pull request Oct 31, 2018

Feature of message re-delivery and dead letters #2278

Closed

frejonb mentioned this pull request Dec 18, 2019

Redelivery count implementation can bring down all the consumers #5881

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PIP-22: Dead Letter Topic #2400

PIP-22: Dead Letter Topic #2400

codelipenghui commented Aug 20, 2018 •

edited by sijie

Loading

jiazhai Aug 22, 2018

codelipenghui Aug 22, 2018

jiazhai Aug 22, 2018

jiazhai Aug 22, 2018

jiazhai left a comment

sijie commented Aug 22, 2018

sijie commented Aug 22, 2018

rdhabalia Aug 24, 2018

jiazhai Aug 24, 2018

codelipenghui Aug 24, 2018

rdhabalia Aug 24, 2018

codelipenghui Aug 24, 2018

jiazhai Aug 24, 2018 •

edited

Loading

merlimat left a comment

merlimat Aug 24, 2018

codelipenghui Aug 24, 2018

merlimat Aug 24, 2018

merlimat Aug 24, 2018

sijie commented Aug 24, 2018 •

edited

Loading

merlimat commented Aug 25, 2018

rdhabalia commented Aug 25, 2018

sijie commented Aug 25, 2018

codelipenghui commented Aug 25, 2018

sijie commented Aug 25, 2018

merlimat commented Aug 25, 2018

codelipenghui commented Aug 26, 2018

codelipenghui commented Sep 4, 2018

sijie commented Sep 4, 2018

PIP-22: Dead Letter Topic #2400

PIP-22: Dead Letter Topic #2400

Conversation

codelipenghui commented Aug 20, 2018 • edited by sijie Loading

Motivation

Modifications

Result

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiazhai left a comment

Choose a reason for hiding this comment

sijie commented Aug 22, 2018

sijie commented Aug 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiazhai Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

merlimat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sijie commented Aug 24, 2018 • edited Loading

merlimat commented Aug 25, 2018

rdhabalia commented Aug 25, 2018

sijie commented Aug 25, 2018

codelipenghui commented Aug 25, 2018

sijie commented Aug 25, 2018

merlimat commented Aug 25, 2018

codelipenghui commented Aug 26, 2018

codelipenghui commented Sep 4, 2018

sijie commented Sep 4, 2018

codelipenghui commented Aug 20, 2018 •

edited by sijie

Loading

jiazhai Aug 24, 2018 •

edited

Loading

sijie commented Aug 24, 2018 •

edited

Loading