Skip to content

Commit

Permalink
[Improve][Docs][Clickhouse] Reconstruct the clickhouse connector doc (#…
Browse files Browse the repository at this point in the history
…5085)



---------

Co-authored-by: chenzy15 <chenzy15@ziroom.com>
  • Loading branch information
MonsterChenzhuo and chenzy15 authored Jul 19, 2023
1 parent 3aa4ae5 commit 70ec3a3
Show file tree
Hide file tree
Showing 2 changed files with 168 additions and 168 deletions.
207 changes: 102 additions & 105 deletions docs/en/connector-v2/sink/Clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,122 +2,137 @@

> Clickhouse sink connector
## Description
## Support Those Engines

Used to write data to Clickhouse.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
## Key features
## Key Features

- [ ] [exactly-once](../../concept/connector-v2-features.md)

The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.

- [x] [cdc](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|---------------------------------------|---------|----------|---------------|
| host | string | yes | - |
| database | string | yes | - |
| table | string | yes | - |
| username | string | yes | - |
| password | string | yes | - |
| clickhouse.config | map | no | |
| bulk_size | string | no | 20000 |
| split_mode | string | no | false |
| sharding_key | string | no | - |
| primary_key | string | no | - |
| support_upsert | boolean | no | false |
| allow_experimental_lightweight_delete | boolean | no | false |
| common-options | | no | - |

### host [string]

`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` .

### database [string]

The `ClickHouse` database

### table [string]

The table name

### username [string]

`ClickHouse` user username

### password [string]

`ClickHouse` user password

### clickhouse.config [map]

In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` .

### bulk_size [number]

The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`, if checkpoints are enabled, writing will also occur at the times when the checkpoints are satisfied .

### split_mode [boolean]

This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option
should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be
counted.

### sharding_key [string]
> The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.
When use split_mode, which node to send data to is a problem, the default is random selection, but the
'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only
worked when 'split_mode' is true.

### primary_key [string]

Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table

### support_upsert [boolean]
## Description

Support upsert row by query primary key
Used to write data to Clickhouse.

### allow_experimental_lightweight_delete [boolean]
## Supported DataSource Info

In order to use the Clickhouse connector, the following dependencies are required.
They can be downloaded via install-plugin.sh or from the Maven central repository.

| Datasource | Supported Versions | Dependency |
|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) |

## Data Type Mapping

| SeaTunnel Data type | Clickhouse Data type |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| STRING | String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygon |
| INT | Int8 / UInt8 / Int16 / UInt16 / Int32 |
| BIGINT | UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond |
| DOUBLE | Float64 |
| DECIMAL | Decimal |
| FLOAT | Float32 |
| DATE | Date |
| TIME | DateTime |
| ARRAY | Array |
| MAP | Map |

## Sink Options

| Name | Type | Required | Default | Description |
|---------------------------------------|---------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. |
| database | String | Yes | - | The `ClickHouse` database. |
| table | String | Yes | - | The table name. |
| username | String | Yes | - | `ClickHouse` user username. |
| password | String | Yes | - | `ClickHouse` user password. |
| clickhouse.config | Map | No | | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. |
| bulk_size | String | No | 20000 | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`. |
| split_mode | String | No | false | This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option-should be `true`.They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will counted. |
| sharding_key | String | No | - | When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true. |
| primary_key | String | No | - | Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table. |
| support_upsert | Boolean | No | false | Support upsert row by query primary key. |
| allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. |
| common-options | | No | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details. |

## How to Create a Clickhouse Data Synchronization Jobs

The following example demonstrates how to create a data synchronization job that writes randomly generated data to a Clickhouse database:

```bash
# Set the basic configuration of the task to be performed
env {
execution.parallelism = 1
job.mode = "BATCH"
checkpoint.interval = 1000
}

Allow experimental lightweight delete based on `*MergeTree` table engine
source {
FakeSource {
row.num = 2
bigint.min = 0
bigint.max = 10000000
split.num = 1
split.read-interval = 300
schema {
fields {
c_bigint = bigint
}
}
}
}

### common options
sink {
Clickhouse {
host = "127.0.0.1:9092"
database = "default"
table = "test"
username = "xxxxx"
password = "xxxxx"
}
}
```

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details
### Tips

## Examples
> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md). <br/>
> 2.The table to be written to needs to be created in advance before synchronization.<br/>
> 3.When sink is writing to the ClickHouse table, you don't need to set its schema because the connector will query ClickHouse for the current table's schema information before writing.<br/>
Simple
## Clickhouse Sink Config

```hocon
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
clickhouse.confg = {
username = "xxxxx"
password = "xxxxx"
clickhouse.config = {
max_rows_to_read = "100"
read_overflow_mode = "throw"
}
}
}
```

Split mode
## Split Mode

```hocon
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
username = "xxxxx"
password = "xxxxx"
# split mode options
split_mode = true
Expand All @@ -126,16 +141,16 @@ sink {
}
```

CDC(Change data capture)
## CDC(Change data capture) Sink

```hocon
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
username = "xxxxx"
password = "xxxxx"
# cdc options
primary_key = "id"
Expand All @@ -144,16 +159,16 @@ sink {
}
```

CDC(Change data capture) for *MergeTree engine
## CDC(Change data capture) for *MergeTree engine

```hocon
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
username = "xxxxx"
password = "xxxxx"
# cdc options
primary_key = "id"
Expand All @@ -163,21 +178,3 @@ sink {
}
```

## Changelog

### 2.2.0-beta 2022-09-26

- Add ClickHouse Sink Connector

### 2.3.0-beta 2022-10-20

- [Improve] Clickhouse Support Int128,Int256 Type ([3067](https://github.com/apache/seatunnel/pull/3067))

### next version

- [Improve] Clickhouse Sink support nest type and array type([3047](https://github.com/apache/seatunnel/pull/3047))
- [Improve] Clickhouse Sink support geo type([3141](https://github.com/apache/seatunnel/pull/3141))
- [Feature] Support CDC write DELETE/UPDATE/INSERT events ([3653](https://github.com/apache/seatunnel/pull/3653))
- [Improve] Remove Clickhouse Fields Config ([3826](https://github.com/apache/seatunnel/pull/3826))
- [Improve] Change Connector Custom Config Prefix To Map [3719](https://github.com/apache/seatunnel/pull/3719)

Loading

0 comments on commit 70ec3a3

Please sign in to comment.