Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support ClickHouse driver #221

Merged
merged 9 commits into from
Jul 7, 2023
Merged

Conversation

kokokuo
Copy link
Contributor

@kokokuo kokokuo commented Jul 6, 2023

Description

Support ClickHouse driver to connect ClickHouse data source via ClickHouse Official Node driver.

Please read the README to know How to use it

Issue ticket number

closes #183

How to test it by labs

Create ClickHouse and prepare data.

  1. Use the docker image to run the ClickHouse

Set the database, username, and password by Docker's environment. See Docker image to know more information

$ docker run -d -p 18123:8123 -e CLICKHOUSE_DB=db -e CLICKHOUSE_USER=user -e CLICKHOUSE_PASSWORD=123 --name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server:23
$ docker ps -a
CONTAINER ID   IMAGE                                                                                 COMMAND                  CREATED         STATUS                     PORTS                                                             NAMES
c36639dcb36e   clickhouse/clickhouse-server:23                                                       "/entrypoint.sh"         5 seconds ago   Up 3 seconds               9000/tcp, 9009/tcp, 0.0.0.0:18123->8123/tcp, :::18123->8123/tcp   some-clickhouse-server

The ClickHouse 8123 port is used for HTTP API Port for HTTP requests. used by JDBC, ODBC, and web interfaces. Therefore our node ClickHouse could connect it, for more information about other ports, see Network ports

  1. Use Docker exec enter to the container
$ docker exec -it c36639dcb36e /bin/bash
root@c36639dcb36e:/$
  1. Type clickhouse-client to the session in the container and use SHOW TABLES to make it work
root@c36639dcb36e:/$ clickhouse-client
ClickHouse client version 23.6.1.1524 (official build).
Connecting to localhost:9000 as user user.
Connected to ClickHouse server version 23.6.1 revision 54464.

Warnings:
 * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource

c36639dcb36e :) SHOW TABLES
SHOW TABLES

Query id: 6b04211c-c5c8-467d-a826-de0eb644ea7b

Ok.

0 rows in set. Elapsed: 0.003 sec.
  1. Create a Table and Insert data, the sample from official but it's under db schema.
-- Create the table under db schema
CREATE TABLE db.my_first_table
(
    user_id UInt32,
    message String,
    timestamp DateTime,
    metric Float32
)
ENGINE = MergeTree
PRIMARY KEY (user_id, timestamp)
-- Insert data
INSERT INTO db.my_first_table (user_id, message, timestamp, metric) VALUES
    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

Then query all to make the data inserted

截圖 2023-07-07 下午2 17 25

Add sample to /labs

  1. type make and make sure your node_modules has @clickhouse/client and extension-driver-clickhouse under @vulcan-sql directory.

截圖 2023-07-07 下午2 19 50

  1. Add the extension-driver-clickhouse in the vulcan.yarml
...
extensions:
  ch: '@vulcan-sql/extension-driver-clickhouse' # Add this line
  1. Add the ClickHouse profile with connection settings in profile.yaml :
...
- name: ch
  type: clickhouse
  connection:
    host: http://localhost:18123 
    username: user
    password: '123'
    database: db
  allow: '*'
  1. Add the SQL file with API Schema YAML and run VulcanSQL by vulcan start :

Filename: my_first_table.sql

--- Query all if not API not have value from API request
{% set user_id = context.params.id %}

SELECT *
FROM my_first_table
{% if user_id %} WHERE user_id = {{ user_id }} {% endif %}
ORDER BY timestamp

Filename: my_first_table.yaml:

urlPath: /clickhouse/my_first_table
request:
  - fieldName: id
    fieldIn: query
    description: user_id
profile: ch

截圖 2023-07-07 下午2 45 30

Then type vulcan start:

截圖 2023-07-07 下午2 41 01

  1. Open the document URL:

截圖 2023-07-07 下午2 42 53

  1. Send the API request to /api/clickhouse/my_first_table to see all data:

截圖 2023-07-07 下午2 43 18

  1. Send the API request to /api/clickhouse/my_first_table?id=101 to see the filter result:

截圖 2023-07-07 下午2 43 28

Additional Context

Currently, the caching dataset feature is not supported.

The ClickHouse driver does not currently implement the export method for exporting Parquet files. As a result, it does not yet support caching datasets. If you attempt to use the ClickHouse driver with the caching dataset feature, it will fail.

How to handle scenarios where ClickHouse cannot provide both the column name and type simultaneously, as well as JSON data containing field names.

The ClickHouse client does not support retrieving column names, types, and data rows together when using the JSON format. However, by using the JSONCompactEachRowWithNamesAndTypes format, we can obtain column names, types, and data rows as an array-like structure, as shown below:
截圖 2023-07-07 下午1 56 45
Therefore, it is necessary to normalize each data row and include the corresponding column name in the JSON object.

However, there is a cost associated with this approach. As an alternative solution, we can choose the JSONEachRow format to retrieve the data rows with column names using an array JSON object. To determine the return type without executing the query or retrieving the data, we can utilize the DESCRIBE TABLE (subquery) syntax. This method will only evaluate the return type instead of executing the query and returning the data.

Retrieving the column name and type by DESCRIBE TABLE (subquery):
截圖 2023-07-07 下午1 57 05

Getting the query results through using JSONEachRow format, its array JSON object:

截圖 2023-07-07 下午1 56 55

How to handle the issue of ClickHouse Driver using specific names and types as placeholders ({name:type})

The ClickHouse driver utilizes specific names and types as placeholders ({name:type}). Please refer to the documentation for more details on handling parameterized queries to prevent SQL injection. However, ClickHouse supports a variety of data types that cannot be directly inferred from JSON types. Therefore, in the mapToClickHouseType method, the driver only handles the following conversions and requires users to use ClickHouse Regular Functions or Type Conversion Functions for other conversions. Please see the Note for more information:

  • boolean to Bool ClickHouse type
  • number to Int or Float ClickHouse type
  • string to String ClickHouse type

PS: When defining column type or query result with the parameterized query, The Bool or Boolean type is both supported, but the column type of query result only returns Bool, so we only support Bool type for safety.

@vercel
Copy link

vercel bot commented Jul 6, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
vulcan-sql-document ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 7, 2023 7:28am

@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.02 ⚠️

Comparison is base (46417b8) 90.52% compared to head (942adb3) 90.50%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #221      +/-   ##
===========================================
- Coverage    90.52%   90.50%   -0.02%     
===========================================
  Files          331      331              
  Lines         5477     5477              
  Branches       732      732              
===========================================
- Hits          4958     4957       -1     
- Misses         374      375       +1     
  Partials       145      145              
Flag Coverage Δ
build 90.55% <ø> (ø)
catalog-server 100.00% <ø> (ø)
cli 75.85% <ø> (ø)
core 94.18% <ø> (ø)
extension-authenticator-canner 80.48% <ø> (ø)
extension-dbt 97.43% <ø> (ø)
extension-debug-tools 98.11% <ø> (ø)
extension-driver-bq 84.72% <ø> (-0.70%) ⬇️
extension-driver-canner 84.65% <ø> (ø)
extension-driver-clickhouse ∅ <ø> (?)
extension-driver-duckdb 96.61% <ø> (ø)
extension-driver-pg 96.11% <ø> (ø)
extension-driver-snowflake 96.26% <ø> (ø)
extension-store-canner 98.30% <ø> (ø)
integration-testing 90.27% <ø> (ø)
serve 87.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

- Create the "extension-driver-clickhouse" package
- Create "ClickHouseDataSource".
- update package.json to install "@clickhouse/client"
…ng method

- add clickhouse server for running test cases by docker.
- refactor clickhouse data source by querying data with column name and evaluating column with type by "describe" method.
- set "testEnvironment" to "node" in jest environment to make clickhouse client could work when running test cases.
- fix typeMapper for converting "Bool" to "boolean"
@kokokuo kokokuo changed the title [WIP] Feature: Support ClickHouse driver Feature: Support ClickHouse driver Jul 7, 2023
…khouse type

- normalize to convert to Bool clickhouse type.
- change to use number to prevent docker name duplicated.
- update README.
Copy link
Contributor

@onlyjackfrost onlyjackfrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the Makefile part, Others LGTM and can be merged.

@@ -56,6 +56,15 @@ pkg-extension-authenticator-canner: ../../node_modules
rm -rf ./labs/playground1/node_modules/@vulcan-sql/extension-authenticator-canner; \
cp -R ./dist/packages/extension-authenticator-canner ./labs/playground1/node_modules/@vulcan-sql

pkg-extension-driver-clickhouse: ../../node_modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? It seems like the vulcan.yaml didn't use the clickhouse extension

@onlyjackfrost onlyjackfrost merged commit 889deb7 into develop Jul 7, 2023
1 check passed
@hanshino hanshino deleted the feature/ds-clickhouse branch January 31, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants