-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Destination: ClickHouse #7620
Conversation
This is amazing @burmecia !!! I'll ask the team to review this contribution in mid-time can you run |
Sorry I forgot it, it is done now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!!
Few suggestions to keep the code cleaner
@@ -27,6 +27,7 @@ | |||
.put("airbyte/destination-postgres-strict-encrypt", ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DestinationType.POSTGRES)) | |||
.put("airbyte/destination-redshift", ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DestinationType.REDSHIFT)) | |||
.put("airbyte/destination-snowflake", ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DestinationType.SNOWFLAKE)) | |||
.put("airbyte/destination-clickhouse", ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DestinationType.CLICKHOUSE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.put("airbyte/destination-clickhouse", ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DestinationType.CLICKHOUSE)) | |
.put("airbyte/destination-clickhouse", ImmutablePair.of("airbyte/normalization-clickhouse", DestinationType.CLICKHOUSE)) |
which means this PR is also missing some changes in: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/base-normalization/docker-compose.build.yaml to publish the normalization docker image for clickhouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
select *, | ||
{{ case_begin }} {{ airbyte_end_at }} is null {{ cdc_active_row }} {{ case_then }} 1 {{ case_else }} 0 {{ case_end }} as {{ active_row }} | ||
from ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing nested select (Subquery), it would be best to keep using the CTE syntax (already heavily used in this step)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, I've put it in a new CTE
lag_begin = "lag" | ||
lag_end = "" | ||
if self.destination_type == DestinationType.CLICKHOUSE: | ||
# ClickHouse doesn't support lag() yet, this is a workaround solution | ||
# Ref: https://clickhouse.com/docs/en/sql-reference/window-functions/ | ||
lag_begin = "anyOrNull" | ||
lag_end = "ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING" | ||
|
||
case_begin = "case when" | ||
case_then = "then" | ||
case_else = "else" | ||
case_end = "end" | ||
if self.destination_type == DestinationType.CLICKHOUSE: | ||
# ClickHouse doesn't have CASE WHEN, use multiIf instead | ||
# Ref: https://clickhouse.com/docs/en/sql-reference/functions/conditional-functions/#multiif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a lot of "hacks" because clickhouse doesn't support some SQL syntax...
Maybe it'd be worth breaking the whole generate_scd_type_2_model
method into a special clickhouse__generate_scd_type_2_model
instead? (with its own jinja template)
It'll make it easier to refactor normalization code when splitting the code base per destination type and break this part of the code into its own class probably where the function is overriden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but put it in a separate clickhouse__generate_scd_type_2_model
function will create a large code duplication because most of them are still in common. There are actually only 4 ClickHouse specific parts:
lag()
functioncase when
cast
left outer join null
All the other parts are same as other database. Maybe need some code refactoring on generate_scd_type_2_model
in the future.
db.close(); | ||
} | ||
|
||
// @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not clear, why do we have this line commented? Whether add a comment or uncomment it, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry forgot removing it, done.
if (config.has("password")) { | ||
configBuilder.put("password", config.get("password").asText()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like "password" good candidate for constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
...ouse/src/main/java/io/airbyte/integrations/destination/clickhouse/ClickhouseDestination.java
Show resolved
Hide resolved
|
||
public class ClickhouseSqlOperations extends JdbcSqlOperations { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ClickhouseSqlOperations.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like LOGGER is unused in the class. So please remove or add logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use it for logging info.
} finally { | ||
try { | ||
if (tmpFile != null) { | ||
Files.delete(tmpFile.toPath()); | ||
} | ||
} catch (final IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional] Throwing an exception from within a finally block will mask any exception which was previously thrown in the try or catch block, and the masked’s exception message and stack trace will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I copied that code from Postgres connector:
Line 54 in b5af3f1
} catch (final IOException e) { |
If you have better resolution, I'd happy to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this: (I'm implementing similar pattern for Destination MariaDB ColumnStore now)
Exception primaryException = null;
try {
// your code
} catch (final Exception e) {
primaryException = e;
throw new RuntimeException(primaryException);
} finally {
try {
// your code
} catch (final IOException e) {
if (primaryException != null) e.addSuppressed(primaryException);
throw new RuntimeException(e);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @koji-m, that's great.
.../src/test/java/io/airbyte/integrations/destination/clickhouse/ClickhouseDestinationTest.java
Show resolved
Hide resolved
|
||
private static final String DB_NAME = "default"; | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ClickhouseDestinationAcceptanceTest.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use LOGGER here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
private static final String DB_NAME = "default"; | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ClickhouseDestinationAcceptanceTest.class); | ||
private static final JSONFormat JSON_FORMAT = new JSONFormat().recordFormat(RecordFormat.OBJECT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use JSON_FORMAT
in the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
private final ExtendedNameTransformer namingResolver = new ExtendedNameTransformer(); | ||
|
||
private ClickHouseContainer db; | ||
private Network network; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like not used field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network
removed.
private ProcessFactory processFactory; | ||
private TestDestinationEnv testEnv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like both of these fields are not used in the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
|
||
@Override | ||
protected JsonNode getFailCheckConfig() { | ||
String ipAddress = db.getContainerInfo().getNetworkSettings().getIpAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What purpose of this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a leftover when I was struggling with container's connectivity doing check test, it can be removed. See testcontainers/testcontainers-java#452.
Hi, @burmecia! I had run the next command:
Despite your screenshot above, I don't have success to passing all Java integration tests: What OS are you using? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PR left some comments.
...ouse/src/main/java/io/airbyte/integrations/destination/clickhouse/ClickhouseDestination.java
Show resolved
Hide resolved
|
||
public class ClickhouseSqlOperations extends JdbcSqlOperations { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ClickhouseSqlOperations.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER is never used
It can be used to log exception on try/catch block or for record size logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final String schemaName, | ||
final String tmpTableName) | ||
throws SQLException { | ||
if (records.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could log record size here:
LOGGER.info("actual size of batch: {}", records.size());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
private static final String DB_NAME = "default"; | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ClickhouseDestinationAcceptanceTest.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER and JSON_FORMAT are not used, pls remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private final ExtendedNameTransformer namingResolver = new ExtendedNameTransformer(); | ||
|
||
private ClickHouseContainer db; | ||
private Network network; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network, processFactory, testEnv are not used, pls remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
...ouse/src/main/java/io/airbyte/integrations/destination/clickhouse/ClickhouseDestination.java
Show resolved
Hide resolved
...se/src/main/java/io/airbyte/integrations/destination/clickhouse/ClickhouseSqlOperations.java
Show resolved
Hide resolved
db.close(); | ||
} | ||
|
||
// @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove commented // @test and check the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks for PR, added some comments |
I am on Mac, but I use Ubuntu VM to run the test because docker's |
I've disabled dbt support and also fixed the reserved keywords bug for #7786, can you try run it again? |
If it makes it easier, it might be a good idea to separate this PR into two instead?
|
@ChristopheDuong I believe this won't make any difference as the main purpose of PR to verify all tests are passed on CI (as we can't to run /test, /publish on forked repos). For me is more easier to have only one branch as I can to fetch and merge contributors commits to it. @burmecia I see normalization fails here:
Do we need this variable? Lines 2536 to 2538 in 43e39f6
|
Hi, @burmecia any response here? Looks like some conflicts in this PR. |
alright, I added missing type hints, can you try it again? |
@burmecia could you fix conflicts, please? 🙂 After, I will run tests. |
Looks like destination tests are all in green. But normalization fails https://github.com/airbytehq/airbyte/runs/4352870241?check_suite_focus=true#step:8:13303. Looks like this tests not related to @burmecia changes and normalization tests fails in master as well. Take to the account this fact there are not more comments from my side and approve from my side. Thanks for contribution. |
…o marcos/test-pr-7620
Tests should be passing fine on master (maybe there seems to have some errors with credentials when you ran it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work @burmecia sorry the long delay to merge your contribution.
* add ClickHouse destination * update docs * format code * code improvement as per code review * add ssh tunneling and ssl/tls support and code enhancement * merge from master * disable testCustomDbtTransformationsFailure test * fix string format bug * fix reserved keywords bug and disable dbt * disable dbt in expect result * add type hints * bump connector version Co-authored-by: Alexander Tsukanov <alexander.tsukanovvv@gmail.com> Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
dbt_config = { | ||
"type": "clickhouse", | ||
"host": config["host"], | ||
"port": config["port"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed to hardcode 9000 instead?
see this thread in slack: https://airbytehq.slack.com/archives/C01MFR03D5W/p1641461068011100
What
How
Recommended reading order
Test Runs
Integration test
Note: Some of the normalization and dbt test cases are disabled because,
DestinationAcceptanceTest.runSync()
method. But I think that should be fine because there are more test cases in the following normalization test are covered.Command used:
airbyte$ ./gradlew :airbyte-integrations:connectors:destination-clickhouse:integrationTest
Normalization test
Command used:
airbyte/airbyte-integrations/bases/base-normalization$ NORMALIZATION_TEST_TARGET=clickhouse pytest ./integration_tests/test_normalization.py
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes