Safely stop(close replication-producer) and remove replicator #152

rdhabalia · 2016-12-23T00:14:35Z

Motivation

Safely close replication-producer while disconnecting replicator
While deleting replication-cluster of the topic: Sometime broker fails to delete replicator-cursor and it tries to restart even after closing the cursor. It causes broker to retries cursor-recovery for already deleted replicator-cluster.

Modifications

On replicator-disconnect: close producer even it's not connected with remote yet.
Avoid restarting of replicator when get error while deleting cursor.

Result

It will prevent producer to reconnect event after replicator disconnection
It will prevent broker to keep retrying cursor recovery for already deleted replication-cluster.

merlimat

👍 Nice catch on this one

merlimat · 2016-12-23T12:45:06Z

pulsar-broker/src/main/java/com/yahoo/pulsar/broker/service/persistent/PersistentTopic.java

@@ -666,8 +666,7 @@ public void deleteCursorComplete(Object ctx) {
                @Override
                public void deleteCursorFailed(ManagedLedgerException exception, Object ctx) {
                    log.error("[{}] Failed to delete cursor {}", topic, name);
-                    // Connect the producers back
-                    replicators.get(remoteCluster).startProducer();
+                    replicators.remove(remoteCluster);


Just one thing here. If the delete cursor fails during the topic load, then the topic load fails and that is fine since it will be re-tried anyway.

If it fails after a policies change though, it will not be retried.
I think the whole PersistentTopic.checkReplication() should be rescheduled after a while to make sure we clean up the cursor properly.

Just retrying the cursor delete could be dangerous, since the remote cluster could be added again before we retry, but the checkReplication() is idempotent and uses the latest configuration.

yes, made the change to schedule checkReplication() while deleteCursorFailed on onPoliciesUpdate()

merlimat

Look good, just a minor thing in the log messages

merlimat · 2017-01-09T17:28:09Z

pulsar-broker/src/main/java/com/yahoo/pulsar/broker/service/persistent/PersistentTopic.java

-        return checkReplication();
+        CompletableFuture<Void> result = new CompletableFuture<Void>();
+        checkReplication().thenAccept(res -> {
+            log.info("Policies updated successfully {}", data);


We should include the topic name here. Also I don't think that data will print the json content and anyway it would be printing too much.

merlimat · 2017-01-09T17:29:10Z

pulsar-broker/src/main/java/com/yahoo/pulsar/broker/service/persistent/PersistentTopic.java

+            log.info("Policies updated successfully {}", data);
+            result.complete(null);
+        }).exceptionally(th -> {
+            log.error("Policies update failed {} {}, scheduled retry in {} seconds", data, th.getMessage(),


Add topic name here as well

merlimat · 2017-01-09T17:48:12Z

pulsar-broker/src/main/java/com/yahoo/pulsar/broker/service/persistent/PersistentTopic.java

+        }).exceptionally(th -> {
+            log.error("Policies update failed {} {}, scheduled retry in {} seconds", data, th.getMessage(),
+                    POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS, th);
+            brokerService.executor().schedule(this::checkReplication, POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS,


Actually, this will only retry 1 time and then give up. Should we keep retrying instead? There might be some intermittent error (eg: failures to write on ZK) that will get fixed after a while.

yes, updated the change with retry on failure.

yahoocla · 2017-01-10T20:06:34Z

CLA is valid!

* changing publish function to accept string for classname * cleaning up * making change for python api

@eolivelli

…tion (apache#17915) (apache#152) (cherry picked from commit 0854032) Fixes apache#9962 ### Motivation Offloaded ledgers can be orphaned on topic deletion. This is a redo of apache#15914 which conflicted with concurrently merged apache#17736 thus resulting in apache#17889 . apache#17736 made a decision to not allow managed ledger trimming for the fenced mledgers because in many case fencing indicates a problems that should stop all operations on mledger. At the same time fencing is used before deletion starts, so trimming added to the deletion process cannot proceed. After discussion with @eolivelli I introduced new state, FencedForDeletion, which acts as Fenced state except for the trimming/deletion purposes. ### Modifications Topic to be truncated before deletion to delete offloaded ledgers properly and fail if truncation fails. ### Verifying this change local fork tests: #1 - [ ] Make sure that the change passes the CI checks. This change added integration tests ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* Nothing changed in the options but admin CLI will implicitly run truncate before topic delete. - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API: (yes / no) - The schema: (yes / no / don't know) - The default values of configurations: (yes / no) - The wire protocol: (yes / no) - The rest endpoints: (yes / no) - The admin cli options: (yes / no) - Anything that affects deployment: (yes / no / don't know) ### Documentation Check the box below or label this PR directly. Need to update docs? - [ ] `doc-required` (Your PR needs to update docs and you will update later) - [x] `doc-not-needed` (Please explain why) - [ ] `doc` (Your PR contains doc changes) - [ ] `doc-complete` (Docs have been already added)

rdhabalia added the type/bug The PR fixed a bug or issue reported a bug label Dec 23, 2016

rdhabalia added this to the 1.16 milestone Dec 23, 2016

rdhabalia self-assigned this Dec 23, 2016

merlimat approved these changes Dec 23, 2016

View reviewed changes

merlimat reviewed Dec 23, 2016

View reviewed changes

rdhabalia mentioned this pull request Jan 3, 2017

Safely stop(close replication-producer) and remove replicator #159

Merged

rdhabalia force-pushed the del_cursor branch 2 times, most recently from 7d41c04 to efc2de2 Compare January 3, 2017 23:16

merlimat reviewed Jan 9, 2017

View reviewed changes

rdhabalia force-pushed the del_cursor branch 3 times, most recently from cc7c9c0 to 4b73e9a Compare January 10, 2017 01:34

Safely stop(close replication-producer) and remove replicator

7c41eb4

rdhabalia force-pushed the del_cursor branch from 4b73e9a to 7c41eb4 Compare January 10, 2017 19:31

merlimat approved these changes Jan 10, 2017

View reviewed changes

merlimat merged commit a6186b2 into apache:master Jan 10, 2017

rdhabalia deleted the del_cursor branch January 23, 2017 22:10

rdhabalia mentioned this pull request Jan 26, 2017

Safely close replicator and Prevent creation of duplicate replicator #175

Merged

sijie pushed a commit to sijie/pulsar that referenced this pull request Mar 4, 2018

changing publish function to accept string for classname (apache#152)

907b6e4

* changing publish function to accept string for classname * cleaning up * making change for python api

xiaotongwang1 mentioned this pull request Aug 4, 2021

Pulsar 2.7.0+ KOP 2.7.2.x getPartitionedTopicMetadata timeout #11532

Closed

ZHr-UChiHa mentioned this pull request Jun 3, 2023

[Bug] python pulsar client create producer thread suspend because of deadlock apache/pulsar-client-python#129

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safely stop(close replication-producer) and remove replicator #152

Safely stop(close replication-producer) and remove replicator #152

rdhabalia commented Dec 23, 2016

merlimat left a comment

merlimat Dec 23, 2016

rdhabalia Jan 3, 2017

merlimat left a comment

merlimat Jan 9, 2017

merlimat Jan 9, 2017

merlimat Jan 9, 2017

rdhabalia Jan 10, 2017 •

edited

Loading

yahoocla commented Jan 10, 2017

Safely stop(close replication-producer) and remove replicator #152

Safely stop(close replication-producer) and remove replicator #152

Conversation

rdhabalia commented Dec 23, 2016

Motivation

Modifications

Result

merlimat left a comment

Choose a reason for hiding this comment

merlimat Dec 23, 2016

Choose a reason for hiding this comment

rdhabalia Jan 3, 2017

Choose a reason for hiding this comment

merlimat left a comment

Choose a reason for hiding this comment

merlimat Jan 9, 2017

Choose a reason for hiding this comment

merlimat Jan 9, 2017

Choose a reason for hiding this comment

merlimat Jan 9, 2017

Choose a reason for hiding this comment

rdhabalia Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

yahoocla commented Jan 10, 2017

rdhabalia Jan 10, 2017 •

edited

Loading