diff --git a/docs/en/connector-v2/sink/Clickhouse.md b/docs/en/connector-v2/sink/Clickhouse.md index 7c4bab991ba..27bf274c77f 100644 --- a/docs/en/connector-v2/sink/Clickhouse.md +++ b/docs/en/connector-v2/sink/Clickhouse.md @@ -2,95 +2,110 @@ > Clickhouse sink connector -## Description +## Support Those Engines -Used to write data to Clickhouse. +> Spark
+> Flink
+> SeaTunnel Zeta
-## Key features +## Key Features - [ ] [exactly-once](../../concept/connector-v2-features.md) - -The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. - - [x] [cdc](../../concept/connector-v2-features.md) -## Options - -| name | type | required | default value | -|---------------------------------------|---------|----------|---------------| -| host | string | yes | - | -| database | string | yes | - | -| table | string | yes | - | -| username | string | yes | - | -| password | string | yes | - | -| clickhouse.config | map | no | | -| bulk_size | string | no | 20000 | -| split_mode | string | no | false | -| sharding_key | string | no | - | -| primary_key | string | no | - | -| support_upsert | boolean | no | false | -| allow_experimental_lightweight_delete | boolean | no | false | -| common-options | | no | - | - -### host [string] - -`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . - -### database [string] - -The `ClickHouse` database - -### table [string] - -The table name - -### username [string] - -`ClickHouse` user username - -### password [string] - -`ClickHouse` user password - -### clickhouse.config [map] - -In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` . - -### bulk_size [number] - -The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`, if checkpoints are enabled, writing will also occur at the times when the checkpoints are satisfied . - -### split_mode [boolean] - -This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option -should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be -counted. - -### sharding_key [string] +> The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. -When use split_mode, which node to send data to is a problem, the default is random selection, but the -'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only -worked when 'split_mode' is true. - -### primary_key [string] - -Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table - -### support_upsert [boolean] +## Description -Support upsert row by query primary key +Used to write data to Clickhouse. -### allow_experimental_lightweight_delete [boolean] +## Supported DataSource Info + +In order to use the Clickhouse connector, the following dependencies are required. +They can be downloaded via install-plugin.sh or from the Maven central repository. + +| Datasource | Supported Versions | Dependency | +|------------|--------------------|------------------------------------------------------------------------------------------------------------------| +| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) | + +## Data Type Mapping + +| SeaTunnel Data type | Clickhouse Data type | +|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------| +| STRING | String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygon | +| INT | Int8 / UInt8 / Int16 / UInt16 / Int32 | +| BIGINT | UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond | +| DOUBLE | Float64 | +| DECIMAL | Decimal | +| FLOAT | Float32 | +| DATE | Date | +| TIME | DateTime | +| ARRAY | Array | +| MAP | Map | + +## Sink Options + +| Name | Type | Required | Default | Description | +|---------------------------------------|---------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. | +| database | String | Yes | - | The `ClickHouse` database. | +| table | String | Yes | - | The table name. | +| username | String | Yes | - | `ClickHouse` user username. | +| password | String | Yes | - | `ClickHouse` user password. | +| clickhouse.config | Map | No | | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. | +| bulk_size | String | No | 20000 | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`. | +| split_mode | String | No | false | This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option-should be `true`.They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will counted. | +| sharding_key | String | No | - | When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true. | +| primary_key | String | No | - | Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table. | +| support_upsert | Boolean | No | false | Support upsert row by query primary key. | +| allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. | +| common-options | | No | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details. | + +## How to Create a Clickhouse Data Synchronization Jobs + +The following example demonstrates how to create a data synchronization job that writes randomly generated data to a Clickhouse database: + +```bash +# Set the basic configuration of the task to be performed +env { + execution.parallelism = 1 + job.mode = "BATCH" + checkpoint.interval = 1000 +} -Allow experimental lightweight delete based on `*MergeTree` table engine +source { + FakeSource { + row.num = 2 + bigint.min = 0 + bigint.max = 10000000 + split.num = 1 + split.read-interval = 300 + schema { + fields { + c_bigint = bigint + } + } + } +} -### common options +sink { + Clickhouse { + host = "127.0.0.1:9092" + database = "default" + table = "test" + username = "xxxxx" + password = "xxxxx" + } +} +``` -Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details +### Tips -## Examples +> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).
+> 2.The table to be written to needs to be created in advance before synchronization.
+> 3.When sink is writing to the ClickHouse table, you don't need to set its schema because the connector will query ClickHouse for the current table's schema information before writing.
-Simple +## Clickhouse Sink Config ```hocon sink { @@ -98,9 +113,9 @@ sink { host = "localhost:8123" database = "default" table = "fake_all" - username = "default" - password = "" - clickhouse.confg = { + username = "xxxxx" + password = "xxxxx" + clickhouse.config = { max_rows_to_read = "100" read_overflow_mode = "throw" } @@ -108,7 +123,7 @@ sink { } ``` -Split mode +## Split Mode ```hocon sink { @@ -116,8 +131,8 @@ sink { host = "localhost:8123" database = "default" table = "fake_all" - username = "default" - password = "" + username = "xxxxx" + password = "xxxxx" # split mode options split_mode = true @@ -126,7 +141,7 @@ sink { } ``` -CDC(Change data capture) +## CDC(Change data capture) Sink ```hocon sink { @@ -134,8 +149,8 @@ sink { host = "localhost:8123" database = "default" table = "fake_all" - username = "default" - password = "" + username = "xxxxx" + password = "xxxxx" # cdc options primary_key = "id" @@ -144,7 +159,7 @@ sink { } ``` -CDC(Change data capture) for *MergeTree engine +## CDC(Change data capture) for *MergeTree engine ```hocon sink { @@ -152,8 +167,8 @@ sink { host = "localhost:8123" database = "default" table = "fake_all" - username = "default" - password = "" + username = "xxxxx" + password = "xxxxx" # cdc options primary_key = "id" @@ -163,21 +178,3 @@ sink { } ``` -## Changelog - -### 2.2.0-beta 2022-09-26 - -- Add ClickHouse Sink Connector - -### 2.3.0-beta 2022-10-20 - -- [Improve] Clickhouse Support Int128,Int256 Type ([3067](https://github.com/apache/seatunnel/pull/3067)) - -### next version - -- [Improve] Clickhouse Sink support nest type and array type([3047](https://github.com/apache/seatunnel/pull/3047)) -- [Improve] Clickhouse Sink support geo type([3141](https://github.com/apache/seatunnel/pull/3141)) -- [Feature] Support CDC write DELETE/UPDATE/INSERT events ([3653](https://github.com/apache/seatunnel/pull/3653)) -- [Improve] Remove Clickhouse Fields Config ([3826](https://github.com/apache/seatunnel/pull/3826)) -- [Improve] Change Connector Custom Config Prefix To Map [3719](https://github.com/apache/seatunnel/pull/3719) - diff --git a/docs/en/connector-v2/source/Clickhouse.md b/docs/en/connector-v2/source/Clickhouse.md index 07384875cb0..7596bf72a8f 100644 --- a/docs/en/connector-v2/source/Clickhouse.md +++ b/docs/en/connector-v2/source/Clickhouse.md @@ -2,93 +2,96 @@ > Clickhouse source connector -## Description +## Support Those Engines -Used to read data from Clickhouse. +> Spark
+> Flink
+> SeaTunnel Zeta
-## Key features +## Key Features - [x] [batch](../../concept/connector-v2-features.md) - [ ] [stream](../../concept/connector-v2-features.md) - [ ] [exactly-once](../../concept/connector-v2-features.md) - [x] [column projection](../../concept/connector-v2-features.md) - -supports query SQL and can achieve projection effect. - - [ ] [parallelism](../../concept/connector-v2-features.md) - [ ] [support user-defined split](../../concept/connector-v2-features.md) -## Options - -| name | type | required | default value | -|------------------|--------|----------|------------------------| -| host | string | yes | - | -| database | string | yes | - | -| sql | string | yes | - | -| username | string | yes | - | -| password | string | yes | - | -| server_time_zone | string | no | ZoneId.systemDefault() | -| common-options | | no | - | - -### host [string] - -`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . - -### database [string] - -The `ClickHouse` database - -### sql [string] - -The query sql used to search data though Clickhouse server - -### username [string] - -`ClickHouse` user username - -### password [string] - -`ClickHouse` user password +> supports query SQL and can achieve projection effect. -### server_time_zone [string] - -The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone. - -### common options +## Description -Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details +Used to read data from Clickhouse. -## Examples +## Supported DataSource Info + +In order to use the Clickhouse connector, the following dependencies are required. +They can be downloaded via install-plugin.sh or from the Maven central repository. + +| Datasource | Supported Versions | Dependency | +|------------|--------------------|------------------------------------------------------------------------------------------------------------------| +| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) | + +## Data Type Mapping + +| Clickhouse Data type | SeaTunnel Data type | +|-----------------------------------------------------------------------------------------------------------------------------------------------|---------------------| +| String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygon | STRING | +| Int8 / UInt8 / Int16 / UInt16 / Int32 | INT | +| UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond | BIGINT | +| Float64 | DOUBLE | +| Decimal | DECIMAL | +| Float32 | FLOAT | +| Date | DATE | +| DateTime | TIME | +| Array | ARRAY | +| Map | MAP | + +## Source Options + +| Name | Type | Required | Default | Description | +|------------------|--------|----------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------| +| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . | +| database | String | Yes | - | The `ClickHouse` database. | +| sql | String | Yes | - | The query sql used to search data though Clickhouse server. | +| username | String | Yes | - | `ClickHouse` user username. | +| password | String | Yes | - | `ClickHouse` user password. | +| server_time_zone | String | No | ZoneId.systemDefault() | The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone. | +| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details. | + +## How to Create a Clickhouse Data Synchronization Jobs + +The following example demonstrates how to create a data synchronization job that reads data from Clickhouse and prints it on the local client: + +```bash +# Set the basic configuration of the task to be performed +env { + execution.parallelism = 1 + job.mode = "BATCH" +} -```hocon +# Create a source to connect to Clickhouse source { - Clickhouse { host = "localhost:8123" database = "default" sql = "select * from test where age = 20 limit 100" - username = "default" - password = "" + username = "xxxxx" + password = "xxxxx" server_time_zone = "UTC" result_table_name = "test" } - } -``` - -## Changelog -### 2.2.0-beta 2022-09-26 - -- Add ClickHouse Source Connector - -### 2.3.0-beta 2022-10-20 - -- [Improve] Clickhouse Source random use host when config multi-host ([3108](https://github.com/apache/seatunnel/pull/3108)) - -### next version +# Console printing of the read Clickhouse data +sink { + Console { + parallelism = 1 + } +} +``` -- [Improve] Clickhouse Source support nest type and array type([3047](https://github.com/apache/seatunnel/pull/3047)) +### Tips -- [Improve] Clickhouse Source support geo type([3141](https://github.com/apache/seatunnel/pull/3141)) +> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).