-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for tuple ClickHouse #29715
Add support for tuple ClickHouse #29715
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Cc @Abacn since you've reviewed the last 2 PRs adding ClickHouse data types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, did a quick pass and let some comments. However I am going to travel for the next weeks. Would be great if could find another reviewer or until late December
CHANGES.md
Outdated
@@ -68,6 +68,7 @@ | |||
* Adding support for LowCardinality DataType in ClickHouse (Java) ([#29533](https://github.com/apache/beam/pull/29533)). | |||
* Added support for handling bad records to KafkaIO (Java) ([#29546](https://github.com/apache/beam/pull/29546)) | |||
* Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models.([#29564](https://github.com/apache/beam/pull/29564)) | |||
* Adding support for Tuples DataType in ClickHouse (Java) ([Tuple Support](https://github.com/apache/beam/pull/29715)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can combine this with LowCardinality DataType above, e.g.
Adding support for LowCardinality and Tuples DataType in ClickHouse (Java) ([#29533](https://github.com/apache/beam/pull/29533), [#29715](https://github.com/apache/beam/pull/29715)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release cut is today so this may not get into 2.53.0. Could leave it as is and fix when the PR is finalized
@@ -112,7 +113,8 @@ | |||
* </table> | |||
* | |||
* Nullable row columns are supported through Nullable type in ClickHouse. Low cardinality hint is | |||
* supported through LowCardinality DataType in ClickHouse. | |||
* supported through LowCardinality DataType in ClickHouse supported through Tuple DataType in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider put the documentation Tuple to the table above. Broken sentence here currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this sentence
.map(s -> s.trim().replaceAll(" +", "':;")) | ||
.collect(Collectors.toList()); | ||
String content = | ||
String.join(",", l).trim().replaceAll("Tuple\\(", "Tuple('").replaceAll(",", ",'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller has a condition if (type.toLowerCase().trim().startsWith("tuple(")) {
below, so here payload
always (case insensitive) starts with "tuple(", so here the replace of "Tuple\(" won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is on with lower case, but the actual parsing is on the original string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean below when " if (type.toLowerCase().trim().startsWith("tuple(")) {" the this preprocessing will be executed.
So Tuple( (with backslash) is considered here but not the condition in the caller there. Coild this cause issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Would you mind adding some comments about the proprocessing rule and example, for example, sth like // Tuple(a String, b Integer) -> ...
@@ -99,6 +103,9 @@ TOKEN : | |||
| < EQ : "=" > | |||
| < BOOL : "BOOL" > | |||
| < LOWCARDINALITY : "LOWCARDINALITY" > | |||
| < TUPLE : "TUPLE" > | |||
| < COLON : ":" > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is COLON and SEMI_COLON for tuple or another feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using that since i have some ambiguity in parsing (looking for other options), but for now decided to leave it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's get added to SDK then there is expectation of backward compatibility. So I would prefer to keep consistent with ClickHouse's SQL syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a temporary solution since there is some ambiguity when using javacc
having difficulty to detecting between tokens and strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed COLON
and SEMI_ COLON
Map<String, ColumnType> m2 = new HashMap<>(); | ||
m2.put("a1", ColumnType.STRING); | ||
m2.put("b", ColumnType.BOOL); | ||
ColumnType columnType02 = ColumnType.parse("Tuple('a1':;String,'b':;Bool)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is :;
a standard syntax for ClickHouse? https://clickhouse.com/docs/en/sql-reference/data-types/tuple seems not mentioning this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Abacn Removed :;
Reminder, please take a look at this pr: @robertwb @chamikaramj |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the syntax now looks good and consistent with clickhouse documentation. Replied to previous comment
CHANGES.md
Outdated
* Added support for handling bad records to KafkaIO (Java) ([#29546](https://github.com/apache/beam/pull/29546)) | ||
* Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models.([#29564](https://github.com/apache/beam/pull/29564)) | ||
* NATS IO connector added (Go) ([#29000](https://github.com/apache/beam/issues/29000)). | ||
* Adding support for LowCardinality and Tuples DataType in ClickHouse (Java) ([#29533](https://github.com/apache/beam/pull/29533), [#29715](https://github.com/apache/beam/pull/29715)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LowCardinality will go to 2.53.0 but Tuples does not, as 2.53.0 is already cut
@@ -112,7 +113,8 @@ | |||
* </table> | |||
* | |||
* Nullable row columns are supported through Nullable type in ClickHouse. Low cardinality hint is | |||
* supported through LowCardinality DataType in ClickHouse. | |||
* supported through LowCardinality DataType in ClickHouse supported through Tuple DataType in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this sentence
.map(s -> s.trim().replaceAll(" +", "':;")) | ||
.collect(Collectors.toList()); | ||
String content = | ||
String.join(",", l).trim().replaceAll("Tuple\\(", "Tuple('").replaceAll(",", ",'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean below when " if (type.toLowerCase().trim().startsWith("tuple(")) {" the this preprocessing will be executed.
So Tuple( (with backslash) is considered here but not the condition in the caller there. Coild this cause issue?
waiting on author |
@Abacn |
You're right, sorry for confusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
case TUPLE: | ||
List<Schema.Field> fields = | ||
columnType.tupleTypes().entrySet().stream() | ||
.map(x -> Schema.Field.of(x.getKey(), Schema.FieldType.DATETIME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mzitnik,
Shouldn't each key inside a tuple be mapped to their respective data type ? Any reason for being DATETIME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
This quick adaptation seems to work:
case TUPLE:
List<TableSchema.Column> columns = columnType.tupleTypes().entrySet().stream().map(
x -> TableSchema.Column.of(x.getKey(), x.getValue())
).collect(Collectors.toList());
TableSchema tupleSchema = TableSchema.of(columns.toArray(new TableSchema.Column[0]));
Schema tupleAsRowSchema = getEquivalentSchema(tupleSchema);
return Schema.FieldType.row(tupleAsRowSchema);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Also, it seems that Array(Tuple(*))
is not supported when calling ClickHouseIO.getTableSchema
.
This seems to fix the issue :
if (type.toLowerCase().trim().startsWith("tuple(")) { String content = tuplePreprocessing(type); columnType = TableSchema.ColumnType.parse(content); } else if (type.toLowerCase().trim().startsWith("array(tuple(")) { String content = tuplePreprocessing(type); columnType = TableSchema.ColumnType.parse(content); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support for tuple ClickHouse
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.