Autorecovery: configure Pulsar's rack awareness integration #214

michaeljmarshall · 2022-05-17T19:04:04Z

Motivation

By default, Pulsar's rack awareness solution relies on state stored in zookeeper. When autorecovery runs, the client needs to have this metadata in order to follow the placement policy.

This change could technically break deployments that expect the default DNS Resolver: ScriptBasedMapping.

Note: one benefit of this PR is that we'll get rid of this exception that is current seen on bookkeeper and autorecovery startup.

17:26:33.864 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.bookkeeper.net.ScriptBasedMapping, used default subnet resolver
java.lang.RuntimeException: No network topology script is found when using script based DNS resolver.
	at org.apache.bookkeeper.net.ScriptBasedMapping$RawScriptBasedMapping.validateConf(ScriptBasedMapping.java:163) ~[com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.net.AbstractDNSToSwitchMapping.setConf(AbstractDNSToSwitchMapping.java:81) ~[com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.net.ScriptBasedMapping.setConf(ScriptBasedMapping.java:123) ~[com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.initialize(RackawareEnsemblePlacementPolicyImpl.java:265) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.initialize(RackawareEnsemblePlacementPolicyImpl.java:80) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.client.BookKeeper.initializeEnsemblePlacementPolicy(BookKeeper.java:581) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.client.BookKeeper.<init>(BookKeeper.java:505) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.client.BookKeeper$Builder.build(BookKeeper.java:306) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.replication.Auditor.createBookKeeperClient(Auditor.java:280) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.replication.AutoRecoveryMain.<init>(AutoRecoveryMain.java:95) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.server.service.AutoRecoveryService.<init>(AutoRecoveryService.java:41) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.replication.AutoRecoveryMain.buildAutoRecoveryServer(AutoRecoveryMain.java:358) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.replication.AutoRecoveryMain.doMain(AutoRecoveryMain.java:326) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]
	at org.apache.bookkeeper.replication.AutoRecoveryMain.main(AutoRecoveryMain.java:308) [com.datastax.oss-bookkeeper-server-4.14.4.1.0.0.jar:4.14.4.1.0.0]

eolivelli · 2022-05-17T21:42:36Z

Did you make
some manual testing?
The patch looks good but I am not sure that that class is meant to run inside the auto recovery daemon. My concern is more about the pluging with the rest of the Pulsar runtime.

We should do some manual testing and ensure that it is working properly. It won't be easy.

Maybe there is already some unit test or integration test in Lulsad repo

michaeljmarshall · 2022-05-17T21:53:18Z

@eolivelli - I manually verified that the autorecovery pod and the bookkeeper pod start. I didn't test the autorecovery code path yet, but I can. Without this configuration change, the auto recovery process won't have the rack information that is configured in zookeeper.

michaeljmarshall · 2022-05-18T02:24:10Z

@eolivelli - I did some additional validation tonight, and everything appears to work correctly. However, I am not an expert on autorecover, so please let me know if I've missed an important case. In the test, I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and Qa=2. The test shows that the autorecovery pod correctly discovers racks and then identifies when a ensemble is not following the rack placement policy after two bookies are removed. Here are the racks:

pulsar@pulsar-broker-74959d97cd-q7f8j:/pulsar$ bin/pulsar-admin bookies racks-placement
"default    {pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack2, hostname=null), pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack0, hostname=null), pulsar-bookkeeper-3.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack3, hostname=null), pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack0, hostname=null)}"

The autorecovery pod logged the following after I completed configuring the racks:

01:56:25.498 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1000030d905001f local:/172.17.0.5:43494 remoteserver:pulsar-zookeeper-ca.default.svc.cluster.local/10.97.165.234:2181 lastZxid:242 xid:4 sent:17 recv:20 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/bookies
01:56:25.500 [AuditorBookie-172.17.0.5:3181-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1000030d9050021 local:/172.17.0.5:43504 remoteserver:pulsar-zookeeper-ca.default.svc.cluster.local/10.97.165.234:2181 lastZxid:242 xid:4 sent:17 recv:20 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/bookies
01:56:25.511 [AuditorBookie-172.17.0.5:3181-EventThread] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Reloading the bookie rack affinity mapping cache.
01:56:25.512 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Reloading the bookie rack affinity mapping cache.
01:56:25.516 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info updated to {default={pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack1, hostname=null), pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack2, hostname=null), pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack0, hostname=null)}}. Notifying rackaware policy.
01:56:25.518 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info updated to {default={pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack1, hostname=null), pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack2, hostname=null), pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181=BookieInfoImpl(rack=rack0, hostname=null)}}. Notifying rackaware policy.
01:56:25.528 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /rack1/pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.530 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /rack1/pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.531 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack1/pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.530 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack1/pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.539 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /rack2/pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.539 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack2/pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.539 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /rack2/pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.540 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack2/pulsar-bookkeeper-2.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.541 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.542 [ForkJoinPool.commonPool-worker-5] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack0/pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.542 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181
01:56:25.543 [ForkJoinPool.commonPool-worker-3] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack0/pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181

Here are the relevant logs from autorecovery when I removed bookies 2 and 3:

02:09:24.067 [ReplicationWorker] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181>, <Bookie:pulsar-bookkeeper-3.pulsar-bookkeeper.default.svc.cluster.local:3181>, <Bookie:pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181>], allBookies [<Bookie:pulsar-bookkeeper-1.pulsar-bookkeeper.default.svc.cluster.local:3181>, <Bookie:pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181>].
02:09:24.068 [ReplicationWorker] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181>, <Bookie:pulsar-bookkeeper-3.pulsar-bookkeeper.default.svc.cluster.local:3181>], fallback to choose bookie randomly from the cluster.
02:09:24.068 [ReplicationWorker] INFO  org.apache.bookkeeper.client.LedgerFragmentReplicator - Replicating fragment Fragment(LedgerID: 28, FirstEntryID: 0[0], LastKnownEntryID: 1[1], Host: [pulsar-bookkeeper-3.pulsar-bookkeeper.default.svc.cluster.local:3181], Closed: true) in 1 sub fragments.

eolivelli

LGTM

thanks for testing.

so that the probe doesn't continue running indefinitely - resolves the issue with Kubernetes <1.20 "Before Kubernetes 1.20, the field timeoutSeconds was not respected for exec probes: probes continued running indefinitely, even past their configured deadline, until a result was returned." in https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes - datastax#179 already fixed the issue for Kubernetes 1.20+

…ackAffinityMapping (#15640) * [Autorecovery] Default reppDnsResolverClass to ZkBookieRackAffinityMapping * Improve documentation Fixes: #18012 ### Motivation The current Bookkeeper configuration defaults to using `org.apache.bookkeeper.net.ScriptBasedMapping` for the `DNSToSwitchMapping` implementation. However, this default configuration does not align with the Broker's default configuration, which is `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As such, the default configuration for a Pulsar cluster does not lead to ideal rack awareness when ledgers need to be recovered. The result is that a user can configure a cluster for rack awareness and the brokers will honor that configuration, but the autorecovery process will not because it does not have the correct bookkeeper cluster topology view. I propose we configure bookkeeper to use the broker's `ZkBookieRackAffinityMapping` class. That way, autorecovery will honor the operator's configured rack awareness policies out of the box. ### Modifications * Add default value for `reppDnsResolverClass` to the `conf/bookkeeper.conf` configuration. This change effectively switches the default from `org.apache.bookkeeper.net.ScriptBasedMapping` to `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. ### Verifying this change I manually verified that the `ZkBookieRackAffinityMapping` works by running some tests in a minikube cluster deployed with the DataStax helm chart. I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod correctly discovered the racks and then identified when an ensemble was not following the rack placement policy after two bookies were removed. I documented my testing a bit more here: datastax/pulsar-helm-chart#214. ### Does this pull request potentially affect one of the following parts: It changes a default value. The tradeoff is that a user relying on the `ScriptBasedMapping` default might accidentally get switched to using the `ZkBookieRackAffinityMapping` implementation. Given that `ScriptBasedMapping` doesn't work out of the box, and that the broker's default to `ZkBookieRackAffinityMapping`, I think this is an acceptable tradeoff. - [x] `doc`

…ackAffinityMapping (#15640) * [Autorecovery] Default reppDnsResolverClass to ZkBookieRackAffinityMapping * Improve documentation Fixes: #18012 ### Motivation The current Bookkeeper configuration defaults to using `org.apache.bookkeeper.net.ScriptBasedMapping` for the `DNSToSwitchMapping` implementation. However, this default configuration does not align with the Broker's default configuration, which is `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As such, the default configuration for a Pulsar cluster does not lead to ideal rack awareness when ledgers need to be recovered. The result is that a user can configure a cluster for rack awareness and the brokers will honor that configuration, but the autorecovery process will not because it does not have the correct bookkeeper cluster topology view. I propose we configure bookkeeper to use the broker's `ZkBookieRackAffinityMapping` class. That way, autorecovery will honor the operator's configured rack awareness policies out of the box. ### Modifications * Add default value for `reppDnsResolverClass` to the `conf/bookkeeper.conf` configuration. This change effectively switches the default from `org.apache.bookkeeper.net.ScriptBasedMapping` to `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. ### Verifying this change I manually verified that the `ZkBookieRackAffinityMapping` works by running some tests in a minikube cluster deployed with the DataStax helm chart. I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod correctly discovered the racks and then identified when an ensemble was not following the rack placement policy after two bookies were removed. I documented my testing a bit more here: datastax/pulsar-helm-chart#214. ### Does this pull request potentially affect one of the following parts: It changes a default value. The tradeoff is that a user relying on the `ScriptBasedMapping` default might accidentally get switched to using the `ZkBookieRackAffinityMapping` implementation. Given that `ScriptBasedMapping` doesn't work out of the box, and that the broker's default to `ZkBookieRackAffinityMapping`, I think this is an acceptable tradeoff. - [x] `doc` (cherry picked from commit 9812297)

…ackAffinityMapping (apache#15640) * [Autorecovery] Default reppDnsResolverClass to ZkBookieRackAffinityMapping * Improve documentation Fixes: apache#18012 ### Motivation The current Bookkeeper configuration defaults to using `org.apache.bookkeeper.net.ScriptBasedMapping` for the `DNSToSwitchMapping` implementation. However, this default configuration does not align with the Broker's default configuration, which is `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As such, the default configuration for a Pulsar cluster does not lead to ideal rack awareness when ledgers need to be recovered. The result is that a user can configure a cluster for rack awareness and the brokers will honor that configuration, but the autorecovery process will not because it does not have the correct bookkeeper cluster topology view. I propose we configure bookkeeper to use the broker's `ZkBookieRackAffinityMapping` class. That way, autorecovery will honor the operator's configured rack awareness policies out of the box. ### Modifications * Add default value for `reppDnsResolverClass` to the `conf/bookkeeper.conf` configuration. This change effectively switches the default from `org.apache.bookkeeper.net.ScriptBasedMapping` to `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. ### Verifying this change I manually verified that the `ZkBookieRackAffinityMapping` works by running some tests in a minikube cluster deployed with the DataStax helm chart. I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod correctly discovered the racks and then identified when an ensemble was not following the rack placement policy after two bookies were removed. I documented my testing a bit more here: datastax/pulsar-helm-chart#214. ### Does this pull request potentially affect one of the following parts: It changes a default value. The tradeoff is that a user relying on the `ScriptBasedMapping` default might accidentally get switched to using the `ZkBookieRackAffinityMapping` implementation. Given that `ScriptBasedMapping` doesn't work out of the box, and that the broker's default to `ZkBookieRackAffinityMapping`, I think this is an acceptable tradeoff. - [x] `doc` (cherry picked from commit 9812297) (cherry picked from commit fc692c3)

michaeljmarshall requested review from cdbartholomew and lhotari May 17, 2022 19:04

Autorecovery: configure Pulsar's rack awareness integration

eb32207

michaeljmarshall force-pushed the fix-rackawareness-for-autorecovery branch from 01b83f0 to eb32207 Compare May 17, 2022 19:46

michaeljmarshall mentioned this pull request May 18, 2022

[fix][storage] Autorecovery default reppDnsResolverClass to ZkBookieRackAffinityMapping apache/pulsar#15640

Merged

1 task

eolivelli approved these changes May 18, 2022

View reviewed changes

lhotari approved these changes Jun 13, 2022

View reviewed changes

lhotari merged commit 7473a0f into master Jun 13, 2022

michaeljmarshall deleted the fix-rackawareness-for-autorecovery branch June 13, 2022 22:17

michaeljmarshall mentioned this pull request Oct 11, 2022

PIP-212: Default reppDnsResolverClass to ZkBookieRackAffinityMapping apache/pulsar#18012

Closed

sijie mentioned this pull request Oct 11, 2022

ISSUE-18012: PIP-212: Default reppDnsResolverClass to ZkBookieRackAffinityMapping streamnative/pulsar-archived#4985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autorecovery: configure Pulsar's rack awareness integration #214

Autorecovery: configure Pulsar's rack awareness integration #214

michaeljmarshall commented May 17, 2022 •

edited

Loading

eolivelli commented May 17, 2022

michaeljmarshall commented May 17, 2022

michaeljmarshall commented May 18, 2022 •

edited

Loading

eolivelli left a comment

Autorecovery: configure Pulsar's rack awareness integration #214

Autorecovery: configure Pulsar's rack awareness integration #214

Conversation

michaeljmarshall commented May 17, 2022 • edited Loading

Motivation

eolivelli commented May 17, 2022

michaeljmarshall commented May 17, 2022

michaeljmarshall commented May 18, 2022 • edited Loading

eolivelli left a comment

Choose a reason for hiding this comment

michaeljmarshall commented May 17, 2022 •

edited

Loading

michaeljmarshall commented May 18, 2022 •

edited

Loading