Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature-WIP][Connector-V2][Doris] Add Doris ConnectorV2 Source #5086

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions docs/en/connector-v2/source/Doris.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Doris
bingquanzhao marked this conversation as resolved.
Show resolved Hide resolved

> Doris source connector

## Support Those Engines

> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>

## Key features

- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [schema projection](../../concept/connector-v2-features.md)
- [x] [parallelism](../../concept/connector-v2-features.md)
- [x] [support user-defined split](../../concept/connector-v2-features.md)

## Description

Used to read data from Doris.
Doris Source will send a SQL to FE, FE will parse it into an execution plan, send it to BE, and BE will
directly return the data

## Supported DataSource Info

| Datasource | Supported versions | Driver | Url | Maven |
|------------|--------------------------------------|--------|-----|-------|
| Doris | Only Doris2.0 or later is supported. | - | - | - |

## Database Dependency

> Please download the support list corresponding to 'Maven' and copy it to the '$SEATNUNNEL_HOME/plugins/jdbc/lib/'
> working directory<br/>

## Data Type Mapping

| Doris Data type | SeaTunnel Data type |
|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| INT | INT |
| TINYINT | TINYINT |
| SMALLINT | SMALLINT |
| BIGINT | BIGINT |
| LARGEINT | STRING |
| BOOLEAN | BOOLEAN |
| DECIMAL | DECIMAL((Get the designated column's specified column size)+1,<br/>(Gets the designated column's number of digits to right of the decimal point.))) |
| FLOAT | FLOAT |
| DOUBLE | DOUBLE |
| CHAR<br/>VARCHAR<br/>STRING<br/>TEXT | STRING |
| DATE | DATE |
| DATETIME<br/>DATETIME(p) | TIMESTAMP |
| ARRAY | ARRAY |

## Source Options

| Name | Type | Required | Default | Description |
|----------------------------------|--------|----------|------------|-----------------------------------------------------------------------------------------------------|
| fenodes | string | yes | - | FE address, the format is `"fe_host:fe_http_port"` |
| username | string | yes | - | User username |
| password | string | yes | - | User password |
| table.identifier | string | yes | - | The name of Doris database and table , the format is `"databases.tablename"` |
| schema | config | yes | - | The schema of the doris that you want to generate |
| doris.filter.query | string | no | - | Data filtering in doris. the format is "field = value". |
| doris.batch.size | int | no | 1024 | The maximum value that can be obtained by reading Doris BE once. |
| doris.request.query.timeout.s | int | no | 3600 | Timeout period of Doris scan data, expressed in seconds. |
| doris.exec.mem.limit | long | no | 2147483648 | Maximum memory that can be used by a single be scan request. The default memory is 2G (2147483648). |
| doris.request.retries | int | no | 3 | Number of retries to send requests to Doris FE. |
| doris.request.read.timeout.ms | int | no | 30000 | |
| doris.request.connect.timeout.ms | int | no | 30000 | |

### Tips

> It is not recommended to modify advanced parameters at will

## Task Example
bingquanzhao marked this conversation as resolved.
Show resolved Hide resolved

> This is an example of reading a Doris table and writing to Console.

```
env {
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Doris {
fenodes = "doris_e2e:8030"
username = root
password = ""
table.identifier = "e2e_source.doris_e2e_table"
schema {
fields {
F_ID = "BIGINT"
F_INT = "INT"
F_BIGINT = "BIGINT"
F_TINYINT = "TINYINT"
F_SMALLINT = "SMALLINT"
F_DECIMAL = "DECIMAL(18,6)"
F_BOOLEAN = "BOOLEAN"
F_DOUBLE = "DOUBLE"
F_FLOAT = "FLOAT"
F_CHAR = "String"
F_VARCHAR_11 = "String"
F_STRING = "String"
F_DATETIME_P = "Timestamp"
F_DATETIME = "Timestamp"
F_DATE = "DATE"
}
}
}
}

transform {
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
# please go to https://seatunnel.apache.org/docs/transform/sql
}

sink {
Console {}
}
```

1 change: 1 addition & 0 deletions plugin-mapping.properties
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ seatunnel.sink.RabbitMQ = connector-rabbitmq
seatunnel.source.RabbitMQ = connector-rabbitmq
seatunnel.source.OpenMldb = connector-openmldb
seatunnel.source.SqlServer-CDC = connector-cdc-sqlserver
seatunnel.source.Doris = connector-doris
seatunnel.sink.Doris = connector-doris
seatunnel.source.Maxcompute = connector-maxcompute
seatunnel.sink.Maxcompute = connector-maxcompute
Expand Down
13 changes: 13 additions & 0 deletions seatunnel-connectors-v2/connector-doris/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
<name>SeaTunnel : Connectors V2 : Doris</name>

<properties>
<connector.name>connector.doris</connector.name>
<httpclient.version>4.5.13</httpclient.version>
<httpcore.version>4.4.4</httpcore.version>
</properties>
Expand Down Expand Up @@ -70,5 +71,17 @@
<artifactId>commons-io</artifactId>
<version>${commons-io.version}</version>
</dependency>
<dependency>
<groupId>org.apache.seatunnel</groupId>
<artifactId>seatunnel-arrow-5.0</artifactId>
<version>${project.version}</version>
<classifier>optional</classifier>
</dependency>
<dependency>
<groupId>org.apache.seatunnel</groupId>
<artifactId>seatunnel-thrift-service</artifactId>
<version>${project.version}</version>
<classifier>optional</classifier>
</dependency>
</dependencies>
</project>
Loading
Loading