Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Kudu] Refactor Kudu functionality and Sink support CDC data. #5437

Merged
merged 14 commits into from
Oct 26, 2023

Conversation

Carl-Zhou-CN
Copy link
Member

@Carl-Zhou-CN Carl-Zhou-CN commented Sep 6, 2023

#5240

1.Refactored the process of Kudu tablet splitting.
2.Adjusted the mapping of error types.
3.Sink Supported CDC Data (Change Data Capture).
4.Supported Kerberos.
5.add e2e

Purpose of this pull request

Check list

@Carl-Zhou-CN Carl-Zhou-CN changed the title [Feature][Kudu] Refactor Kudu functionality and support CDC. [Feature][Kudu] Refactor Kudu functionality and Sink support CDC data. Oct 19, 2023

## Key features

- [x] [batch](../../concept/connector-v2-features.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you have added the features of Source to Sink's documentation

| TIMESTAMP | UNIXTIME_MICROS |
| BYTES | BINARY |

## Sink Options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format have some error:

| UNIXTIME_MICROS | TIMESTAMP |
| BINARY | BYTES |

## Source Options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format have some error.

Uploading image.png…

kudu_masters = "kudu-master-cdc:7051"
table_name = "kudu_sink_table"
}
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add examples for kerberos enable.

source_table_name = "kudu"
kudu_masters = "kudu-master:7051"
table_name = "kudu_sink_table"
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add kerberos enable examples.

parameters[i] = new Long[] {start, end};
start = end + 1;
private void addPendingSplit(Collection<KuduSourceSplit> splits) {
int readerCount = enumeratorContext.currentParallelism();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need synchronized (stateLock) too.

@Override
public void registerReader(int subtaskId) {
log.debug("Register reader {} to KuduSourceSplitEnumerator.", subtaskId);
if (!pendingSplits.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need synchronized (stateLock) too.

}

@Override
public KuduSourceState snapshotState(long checkpointId) throws Exception {
return null;
synchronized (stateLock) {
return new KuduSourceState(pendingSplits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new KuduSourceState(pendingSplits);
return new KuduSourceState(new HashMap<>(pendingSplits));

}
Configuration conf = new Configuration();
conf.set(HADOOP_AUTH_KEY, KRB);
UserGroupInformation.setConfiguration(conf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserGroupInformation is s static Class, If there are more than one place use UserGroupInformation.setConfiguration(conf); in one JVM, this will generate mutual coverage. So new a UserGroupInformation Object is more suggested approach.


## Key features

- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [x] [stream](../../concept/connector-v2-features.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code it seems not supported stream

}

private static UserGroupInformation loginAndReturnUgi(CommonConfig config) throws IOException {
if (StringUtils.isBlank(config.getPrincipal()) || StringUtils.isBlank(config.getKeytab())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synchronized(UserGroupInformation.class) {

}

Can fix UGI coverage problem.

Comment on lines 71 to 72
type = {EngineType.FLINK, EngineType.SPARK},
disabledReason = "Currently SPARK and FLINK do not support cdc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove flink ?

Suggested change
type = {EngineType.FLINK, EngineType.SPARK},
disabledReason = "Currently SPARK and FLINK do not support cdc")
type = {EngineType.SPARK},
disabledReason = "Currently SPARK do not support cdc")

.untilAsserted(
() -> {
System.out.println(readData(KUDU_SINK_TABLE).size());
Assertions.assertEquals(readData(KUDU_SINK_TABLE).size(), 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert rows & field data?

@hailin0
Copy link
Member

hailin0 commented Oct 25, 2023

LGTM

zhouyao and others added 2 commits October 25, 2023 15:17
hailin0
hailin0 previously approved these changes Oct 25, 2023
Copy link
Member

@hailin0 hailin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

EricJoy2048
EricJoy2048 previously approved these changes Oct 25, 2023
Copy link
Member

@EricJoy2048 EricJoy2048 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI complete.

@EricJoy2048 EricJoy2048 merged commit 22110eb into apache:dev Oct 26, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants