Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating from AdminUtils with AdminClient #848

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

surajkn
Copy link
Collaborator

@surajkn surajkn commented Aug 10, 2021

Replace kafka.admin.AdminUtils usage with kafka.client.AdminClient because AdminUtils is deprecated.
Also in a following effort 101tec ZkClient will be replaced by helix ZkClient and this is required for that.

Important: DO NOT REPORT SECURITY ISSUES DIRECTLY ON GITHUB.
For reporting security issues and contributing security fixes,
please, email security@linkedin.com instead, as described in
the contribution guidelines.

Please, take a minute to review the contribution guidelines at:
https://github.com/linkedin/Brooklin/blob/master/CONTRIBUTING.md

@@ -21,4 +21,5 @@ ext {
testngVersion = "7.1.0"
zkclientVersion = "0.11"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably delete this as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no usages you should

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still required. Will remove this in a the following PR where I fully remove and replace 101tec ZkClient with Helix ZkClient.

Copy link
Collaborator

@jzakaryan jzakaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surajkn did a walkthrough of the changes and explained the differences in APIs. Waiting for him to confirm config changes for Kafka client before approving the PR.

@surajkn surajkn force-pushed the 101tec_zkclient_to_helix_zkclient branch from 90983f2 to 4348b9d Compare August 20, 2021 17:57
* @param topic the topic to wait for broker assignment
* @param brokerList the brokers in the Kafka cluster
* @throws IllegalStateException if the topic is not ready before the timeout ({@value #DEFAULT_TIMEOUT_MS} ms)
*/
public static void waitForTopicCreation(ZkUtils zkUtils, String topic, String brokerList) throws IllegalStateException {
Validate.notNull(zkUtils);
public static void waitForTopicCreation(AdminClient adminClient, String topic, String brokerList) throws IllegalStateException {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need this anymore. In this patch I replace calls to AdminUtils.createTopic with calls to AdminClient.createTopics method, and createTopics essentially returns a "KafkaFuture" object on which I currently do a blocking wait (to wait for topic to be created). So we probably don't need this anymore? I have currently removed all calls to this method in test cases and its working fine.

* @param adminClient AdminClient instance to check if topics exists
* @param topic Topic name
*/
public static boolean topicExists(AdminClient adminClient, String topic) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to import "topicExists" method from BaseKafkaZkTest class here but I kept getting "package com.linkedin.datastream.testutil does not exist" error so I copied the method here. Its probably some classpath issue but I am not sure how to resolve it in this repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of the circular dependency problem we discussed offline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a circular dep issue, then could we reference this function from BaseKafkaZkTest class's call to avoid duplicate code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check but I doubt that will work. I guess trying to do it either way will still result in circular dependency.

// Removing from topic config since its passed as a direct argument
topicProperties.remove("replicationFactor");
// TopicExistsException is thrown if topic exists. Its a no-op.
adminClient.createTopics(Collections.singletonList(newTopic)).all().get();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing ".all().get()" we have a blocking wait for the topic to be created. I am not sure what exactly was the behavior with AdminUtils and/or if doing a blocking wait is the correct thing to do here. I am guessing this is the correct way?

However, having said that. On repeated execution of all the unit tests, I noticed that the test "testCreateDatastreamHappyPathDefaultRetention" is a little flaky in that it randomly fails every now and then (very rarely though, hard to reproduce). This test calls "getRetention" which queries topic config and returns retention time if its exists in policy else null, and when this test fails its because getRetention returned null instead of actual expected value. I am not sure if this test was flaky even before. I am not sure how to explain this behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing the offline discussion here:

There's no callbacks and returned futures with AdminUtils.createTopic. No hints that it's an asynchronous operation. I'm not a scala expert, but from the looks of it it's a synchronous call.

As for the failing/flaky test:

  1. Was this test flaky even before or it became flaky as a result of the API changes? We need to rule out changes in behavior of the system after switching from AdminUtils to AdminClient
  2. Does adding a thread sleep after topic creation (and before querying the retention of the new topic) make the test pass?

For (1), we'll reach out to Kafka to confirm that there are no changes in behavior in createTopic APIs. For (2) we can try fixing the flaky test in a separate PR.

}
} catch (Throwable e) {
LOG.error("Creating topic {} failed with exception {}", topicName, e);
throw e;
}
}

private AdminClient getAdminClient(Datastream datastream) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cache AdminClient instances based on the destinationBrokers ? Depends on how frequently this method is called and how expensive it is to re-instantiate AdminClient object, but I am not sure of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an offline discussion with Jhora we decided to not do it for now and do it later if the need arises.


KafkaTestUtils.waitForTopicCreation(_zkUtils, topicName, _kafkaCluster.getBrokers());
//KafkaTestUtils.waitForTopicCreation(_adminClient, topicName, _kafkaCluster.getBrokers());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this in following update to this patch. Same for another instance below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// TopicExistsException is thrown if topic exists. Its a no-op.
adminClient.createTopics(Collections.singletonList(newTopic)).all().get();
} catch (InterruptedException | ExecutionException e) {
if (e instanceof TopicExistsException || e.getCause() instanceof TopicExistsException) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something here, but org.apache.kafka.common.errors.TopicExistsException is neither an InterruptedException nor an ExecutionException according to https://kafka.apache.org/28/javadoc//org/apache/kafka/common/errors/TopicExistsException.html.
How can the first condition of this if statement ever be true? (and I know this code was there even before)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I found this reference implementation in LiKafkaTransporterProviderAdmin implementation here
But let me double check this once anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like TopicExistsException has an inheritance hierarchy (chain) that inherits from java.lang.Exception and both InterruptedException and ExecutionException also inherit from same java.lang.Excepption.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, my point was that TopicExistsException is neither a descendant of InterruptedException nor is it a descendant of ExecutionException. That first condition is always false.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea so, we may need recursive calls to get the cause of the exceptions until we hit the topic exists right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suspect TopicExistsException is raised as "ExecutionException(new TopicExistsException())" and that is why it needs to be handled this way. If I split it and add a separate catch block for TopicExistsException then it fails to catch that exception and that exception goes uncaught. I tried it and saw a few unit tests fail because of it.

* @param adminClient AdminClient instance to check if topics exists
* @param topic Topic name
*/
public static boolean topicExists(AdminClient adminClient, String topic) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of the circular dependency problem we discussed offline?

@@ -21,4 +21,5 @@ ext {
testngVersion = "7.1.0"
zkclientVersion = "0.11"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no usages you should

Copy link
Collaborator

@vmaheshw vmaheshw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please split this PR and separate out the admin client change? AdminClient change can be merged as is.

For the zk client migration part, we can see if we have a jfrog branching. Otherwise, we can duplicate some of the code and have different classes called using config knob and then later get rid of the knob, if things are working fine. It may duplicate some code, but will definitely unblock you and remove your dependency from jfrog branching.

@surajkn surajkn force-pushed the 101tec_zkclient_to_helix_zkclient branch from 4348b9d to 7eb32d2 Compare August 31, 2021 05:34
@surajkn surajkn changed the title Migrating from 101tec ZkClient to Helix ZkClient Migrating from AdminUtils with AdminClient Aug 31, 2021
@surajkn surajkn dismissed vmaheshw’s stale review August 31, 2021 05:37

Undid the ZkClient changes to revert back to extending 101tec ZkClient

@surajkn surajkn requested a review from somandal August 31, 2021 16:43
* @param adminClient AdminClient instance to check if topics exists
* @param topic Topic name
*/
public static boolean topicExists(AdminClient adminClient, String topic) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a circular dep issue, then could we reference this function from BaseKafkaZkTest class's call to avoid duplicate code?

// TopicExistsException is thrown if topic exists. Its a no-op.
adminClient.createTopics(Collections.singletonList(newTopic)).all().get();
} catch (InterruptedException | ExecutionException e) {
if (e instanceof TopicExistsException || e.getCause() instanceof TopicExistsException) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea so, we may need recursive calls to get the cause of the exceptions until we hit the topic exists right?

@surajkn surajkn force-pushed the 101tec_zkclient_to_helix_zkclient branch from 7eb32d2 to 9a12243 Compare September 8, 2021 23:32
@jzakaryan jzakaryan self-requested a review September 9, 2021 23:16
jzakaryan
jzakaryan previously approved these changes Sep 9, 2021
Replace kafka.admin.AdminUtils usage with kafka.client.AdminClient because AdminUtils is deprecated.
Also in a following effort 101tec ZkClient will be replaced by helix ZkClient and this is required
for that.
Copy link
Collaborator

@jzakaryan jzakaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@surajkn surajkn merged commit 5ccd661 into linkedin:master Sep 21, 2021
@surajkn surajkn deleted the 101tec_zkclient_to_helix_zkclient branch September 21, 2021 16:06
vmaheshw pushed a commit to vmaheshw/brooklin that referenced this pull request Mar 1, 2022
Replace kafka.admin.AdminUtils usage with kafka.client.AdminClient because AdminUtils is deprecated.
Also in a following effort 101tec ZkClient will be replaced by helix ZkClient and this is required
for that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants