Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-10097: Replicated Subscription fail with IllegalStateException #2351

Closed
sijie opened this issue Mar 31, 2021 · 0 comments
Closed

ISSUE-10097: Replicated Subscription fail with IllegalStateException #2351

sijie opened this issue Mar 31, 2021 · 0 comments
Labels

Comments

@sijie
Copy link
Member

sijie commented Mar 31, 2021

Original Issue: apache#10097


Problem

This exception is repeated on the log when using replicated subscriptions:

07:17:59.770 [bookkeeper-ml-workers-OrderedExecutor-4-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentReplicator - [persistent://georep/default/t1][cluster-a -> cluster-b] Unexpected exception: Field 'replicated_from' is not set
java.lang.IllegalStateException: Field 'replicated_from' is not set
at org.apache.pulsar.common.api.proto.MessageMetadata.getReplicatedFrom(MessageMetadata.java:151) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.pulsar.broker.service.persistent.PersistentReplicator.checkReplicatedSubscriptionMarker(PersistentReplicator.java:763) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.pulsar.broker.service.persistent.PersistentReplicator.readEntriesComplete(PersistentReplicator.java:366) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:156) ~[org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]

To reproduce

  1. Create 2-cluster k8s deployment (sample script/helm to do so)
  2. Create consumer with replicated subscription for topic georep/default/t1 in cluster-a and close it
  3. Create consumer with replicated subscription for topic georep/default/t1 in cluster-b and close it
  4. Create producer for topic georep/default/t1 in cluster-a and publish messages to the topic
  5. Create consumer with replicated subscription for topic georep/default/t1 in cluster-a, consume 1 message and close it
  6. Create consumer with replicated subscription for topic georep/default/t1 in cluster-b, consume 1 message and close it

It might be possible to reproduce the issue with fewer steps.

Observations

It seems that the code location broker after the switch to LightProto (apache#9046).
The fix is easy for the exception above. The concern is the lack of test coverage for replicated subscriptions.

@sijie sijie added the type/bug label Mar 31, 2021
@sijie sijie closed this as completed Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant