Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API-DRAFT] [MERGE] Merge api-draft to dev branch #2083

Merged
merged 107 commits into from
Jul 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
b7d2d23
[Feature][core] base interface.
ashulin Mar 29, 2022
c7f36e4
[Feature][core] base logic
ashulin Apr 28, 2022
f844d8c
fix build error
ruanwenjun Apr 28, 2022
4639ba1
[api-draft] Add license header (#1777)
ruanwenjun Apr 29, 2022
c340795
Add seatunnel datatype and convert origin value into seatunnel data t…
ruanwenjun May 5, 2022
1935d71
Add flink datatype converter (#1801)
ruanwenjun May 5, 2022
3753ed3
Add complex seatunnel datatype (#1807)
ruanwenjun May 6, 2022
06e74eb
Add flink type converter UT (#1813)
ruanwenjun May 7, 2022
609f587
Rename Row to SeaTunnelRow (#1832)
ruanwenjun May 9, 2022
8f710f7
Add spark basic type converter (#1837)
ruanwenjun May 10, 2022
0db318b
[Api-Draft] Add Flink Sink Converter to support SeaTunnel transfer to…
Hisoka-X May 10, 2022
6b5136b
[Api-Draft] SeaTunnel Source support Flink engine. (#1842)
ashulin May 10, 2022
21e159a
Add generics type about flink sink (#1844)
ruanwenjun May 11, 2022
8ff90fd
Add SeatunnelTaskExecuteCommand (#1847)
ruanwenjun May 11, 2022
b063bd5
Add basic fake source (#1864)
ruanwenjun May 12, 2022
48423a9
[Api-draft] Support Spark Sink convert to SeaTunnel (#1863)
Hisoka-X May 13, 2022
96bd703
Add SeaTunnelAPIExample (#1867)
ruanwenjun May 13, 2022
af7f6e0
[Api-draft] Fix Spark Sink Can't support batch mode. (#1868)
Hisoka-X May 13, 2022
5f79f45
Add plugin discovery module (#1881)
ruanwenjun May 15, 2022
3abf728
Add boundedness to source reader context, so that we can control to s…
ruanwenjun May 16, 2022
661f8d5
Add license header (#1886)
ruanwenjun May 16, 2022
fbeab83
[API-Draft] Add comment on common interface. (#1888)
Hisoka-X May 16, 2022
1844a7a
Add SeaTunnelPluginDiscovery to load seatunnel new api plugin (#1889)
ruanwenjun May 16, 2022
dce9ec9
[Api-Draft] Fix Flink sink type convert error. (#1890)
Hisoka-X May 16, 2022
862c31e
[Api-Draft] SeaTunnel Source support Spark engine. (#1871)
ashulin May 16, 2022
b16f382
fix seatunnel source parallelism (#1892)
ruanwenjun May 17, 2022
b23266e
Add SeaTunnelRowTypeInfo to support source return record type (#1894)
ruanwenjun May 17, 2022
62b304a
Add Transform to new SeaTunnel API (#1900)
ruanwenjun May 17, 2022
273121d
Add plugin processor for SeaTunnel API (#1902)
ruanwenjun May 18, 2022
727c7b2
Add SeaTunnelRowTypeInfo to SeaTunnelSink (#1904)
ruanwenjun May 18, 2022
c085738
Add SeaTunnel plugin lifecycle (#1914)
ruanwenjun May 18, 2022
d26605c
[Api-Draft] Support Spark MicroBatch Reader & Add CheckpointLock (#1907)
ashulin May 18, 2022
b3ca59a
[Api-draft] Add spark example (#1913)
Hisoka-X May 19, 2022
7eebd8d
Support set batch mode in flink (#1916)
ruanwenjun May 19, 2022
5ecdebf
[Api-Draft] Spark InternalRow Serialization (#1918)
ashulin May 19, 2022
38d5c68
fix spark example can't run error. (#1921)
Hisoka-X May 19, 2022
ad58696
Add Transform to SeaTunnel Spark API (#1925)
ruanwenjun May 19, 2022
62b922b
Add new architecture (#1928)
ruanwenjun May 19, 2022
6d5ef6d
fix spark batch can't stop (#1927)
Hisoka-X May 20, 2022
51fd0f7
Add SeaTunnel runtime environment (#1933)
ruanwenjun May 20, 2022
34b37aa
[Api-Draft] Remove useless code. (#1932)
Hisoka-X May 21, 2022
afff619
[API-DRAFT]refactor api base (#1939)
CalvinKirs May 23, 2022
59e3ca5
[Api-draft] Add seatunnel kafka connector (#1940)
Hisoka-X May 23, 2022
a508e29
[API-DRAFT]Rename Flink-stater package name (#1942)
CalvinKirs May 23, 2022
a71e777
Spark micro-batch reader state (#1943)
ashulin May 24, 2022
62d3468
[API-DRAFT]Add spark-core-starter module Code splitting for old and n…
CalvinKirs May 25, 2022
b2cd4c0
Add SeaTunnel kafka sink (#1952)
ruanwenjun May 25, 2022
206f4d0
Fix Kafka Sink on flink cannot serialize element (#1955)
ruanwenjun May 26, 2022
694bbdd
[Api-draft] Add seatunnel kafka source connectors (#1949)
Hisoka-X May 26, 2022
691aabc
[api-draft] Support coordinated source & Fix checkpoint lock implemen…
ashulin May 27, 2022
ffb2d86
[api-draft][flink] non-key operator can't get the keyed state store (…
ashulin May 27, 2022
3601f42
Clean code for new API starter (#1962)
ruanwenjun May 27, 2022
6979cfc
Add new api starter to distribution (#1964)
ruanwenjun May 27, 2022
75c91fc
Add seatunnel new connector example (#1967)
ruanwenjun May 27, 2022
8fc7083
[Api-draft] add README.zh.md for new api (#1966)
Hisoka-X May 30, 2022
9a0ae72
[Api-draft] kafka support commit on checkpoint (#1974)
Hisoka-X May 30, 2022
a9d54fd
Add e2e for new connector (#1973)
ruanwenjun May 30, 2022
ede2937
fix checkpoint and commit error (#1987)
Hisoka-X Jun 7, 2022
dc81c31
[api-draft#1990][Common] The DeployMode code cleanup (#1991)
zhuangchong Jun 7, 2022
0bfd625
[Improve]Use Jackson replace Fastjson (#1971) (#1985)
CalvinKirs Jun 8, 2022
f389cd7
[api-draft][connector] New hive sink connector (#1997)
EricJoy2048 Jun 8, 2022
805f88c
[Api-draft] add jobId and checkpointId (#1998)
Hisoka-X Jun 9, 2022
80794cf
Fix the kafka consumer still getting consumption data when the client…
zhuangchong Jun 9, 2022
9f1b474
[Api-Draft] improve class comment and fix spark row StringType conver…
Hisoka-X Jun 10, 2022
3e7d82b
[api-draft][connector] Fix hive sink dist package in new connector. (…
zhuangchong Jun 10, 2022
b6ed4ce
[api-draft][connector] new socket source (#1999)
zhuangchong Jun 13, 2022
890211c
[api-draft][connector] Add SeaTunnel jdbc sink (#1946) (#2009)
ic4y Jun 13, 2022
62c8a17
[Api-draft] change kafka sink transaction (#2010)
Hisoka-X Jun 14, 2022
ca6c677
[api-draft] Improved conversion of data types (#2003)
ashulin Jun 14, 2022
1673dce
[api-draft][formats] json format (#2014)
ashulin Jun 15, 2022
ebf9489
[Api-draft] Add clickhouse source support (#2013)
Hisoka-X Jun 15, 2022
1113cb9
[api-draft][connector] Add new http source (#2012)
zhuangchong Jun 15, 2022
a8d47c9
[api-draft][connector] apache pulsar source (#1984)
ashulin Jun 17, 2022
5ce65e2
[Feature][connector common] Add Hadoop2 and Hadoop3 shade jar (#2030)
EricJoy2048 Jun 20, 2022
67e9f6f
[hotfix][api-draft] fix flink batch mode (#2038)
ashulin Jun 20, 2022
605a086
[hotfix][api-draft] Coordinated source cannot be stopped in offline j…
ashulin Jun 21, 2022
f19555c
update windows ci timeout to 120min (#2046)
EricJoy2048 Jun 21, 2022
24b4d62
[API-Draft] Fix hadoop shade can't be imported problem (#2045)
Hisoka-X Jun 22, 2022
f35fdc8
add new api doc English version (#2050)
Hisoka-X Jun 23, 2022
500799d
[bugfix][api-draft] Fix KafkaSource parallel mode failure (#2039)
ic4y Jun 24, 2022
20077eb
[API-Draft] Remove hadoop shade module (#2057)
Hisoka-X Jun 24, 2022
a549df6
[api-draft][api] Improve SeaTunnel's data types & Mapping engine data…
ashulin Jun 25, 2022
97f7b43
[api-draft][connector] fix ThreadLocalRandom use (#2059) (#2060)
iture123 Jun 27, 2022
463cafe
[api-draft][connector] Add SeaTunnel jdbc source (#2048)
ic4y Jun 27, 2022
a15ab7f
fix VariablesSubstituteTest create time now twice (#2063)
Hisoka-X Jun 27, 2022
a7357b6
[api-draft][connector] Add simplified connector api (#2041)
ashulin Jun 27, 2022
94fe1e0
[api-draft][catalog] jdbc catalog (#2042)
ashulin Jun 27, 2022
682c58d
[bugfix] Change the JDBC data type (#2065)
ic4y Jun 27, 2022
da09499
[feature] Add postgres support to jdbcSource (#2066)
ic4y Jun 28, 2022
def84d9
[API-Draft][DOC] Add jdbc connector doc (#2069)
ic4y Jun 28, 2022
02ff960
[Cherry-pick#2029]Improve CI jobs to reduce waiting time (#2070)
CalvinKirs Jun 28, 2022
3057ba2
[API-Draft] [Connector] Add Clickhouse source and sink connector (#2051)
Hisoka-X Jun 28, 2022
f79e311
[api-draft][Optimize] Optimize module name (#2062)
EricJoy2048 Jun 28, 2022
5dae9c1
[API-DRAFT] [MERGE] Fix obvious bugs that don't work properly before …
Hisoka-X Jun 29, 2022
02a4190
[api-draft][connector] support Rsync to transfer clickhouse data file…
Emor-nj Jun 29, 2022
369bb0e
[api-draft][flink] The FinkCommitter's commit info class could not be…
ashulin Jun 29, 2022
fc640b5
add assert sink to Api draft (#2071)
lhyundeadsoul Jun 30, 2022
032906f
add clickhouse source and sink docs (#2072)
Hisoka-X Jun 30, 2022
d265597
merge dev to api-draft
Hisoka-X Jun 30, 2022
736ac01
[API-DRAFT] [MERGE] fix merge error
Hisoka-X Jun 30, 2022
0656b50
[API-Draft] [Doc]add common option (#2095)
Hisoka-X Jun 30, 2022
6a492d7
Update Clickhouse.md (#2096)
Hisoka-X Jun 30, 2022
3c0e984
[API-DRAFT] [MERGE] fix merge error
Hisoka-X Jun 30, 2022
10ac17e
Merge remote-tracking branch 'origin/api-draft' into api-draft
Hisoka-X Jun 30, 2022
2bad99d
[api-draft][doc] move Assert.md to new-connector file https://github…
lhyundeadsoul Jun 30, 2022
5ae8865
[API-DRAFT] [MERGE] update license and pom.xml
Hisoka-X Jun 30, 2022
1b706c1
Merge branch 'dev' into api-draft
Hisoka-X Jul 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
java: [ '8', '11' ]
os: [ 'ubuntu-latest', 'windows-latest' ]
runs-on: ${{ matrix.os }}
timeout-minutes: 30
timeout-minutes: 50
steps:
- uses: actions/checkout@v3
with:
Expand Down
2 changes: 1 addition & 1 deletion .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ header:
- mvnw.cmd
- .mvn
- .gitattributes
- '**/known-dependencies.txt'
- '**/known-dependencies-*.txt'
- '**/*.md'
- '**/*.mdx'
- '**/*.json'
Expand Down
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ The text of each license is the standard Apache 2.0 license.

tools/dependencies/checkLicense.sh files from https://github.com/apache/skywalking
mvnw files from https://github.com/apache/maven-wrapper Apache 2.0
seatunnel-api/src/main/java/org/apache/seatunnel/api/table/type/RowKind.java from https://github.com/apache/flink
seatunnel-config/seatunnel-config-shade/src/main/java/org/apache/seatunnel/shade/com/typesafe/config/impl/ConfigNodePath.java from https://github.com/lightbend/config
seatunnel-config/seatunnel-config-shade/src/main/java/org/apache/seatunnel/shade/com/typesafe/config/impl/ConfigParser.java from https://github.com/lightbend/config
seatunnel-config/seatunnel-config-shade/src/main/java/org/apache/seatunnel/shade/com/typesafe/config/impl/Path.java from https://github.com/lightbend/config
Expand Down
2 changes: 2 additions & 0 deletions config/flink.batch.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@ env {

source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**

FakeSource {
result_table_name = "test"
field_name = "name,age"
}

# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
Expand Down
2 changes: 1 addition & 1 deletion config/spark.streaming.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ env {

source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
fakeStream {
FakeStream {
content = ["Hello World, SeaTunnel"]
}

Expand Down
47 changes: 24 additions & 23 deletions docs/en/connector/sink/Assert.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,51 +10,51 @@ A flink sink plugin which can assert illegal data by user defined rules

Engine Supported and plugin name

* [ ] Spark
* [x] Flink: AssertSink
* [x] Spark:Assert
* [x] Flink: Assert

:::

## Options

| name | type | required | default value |
| ----------------------- | -------- | -------- | ------------- |
| rules | ConfigList | yes | - |
|  field_name | `String` | yes | - |
|  field_type | `String` | no | - |
|  field_value | ConfigList | no | - |
|   rule_type | `String` | no | - |
|   rule_value | double | no | - |
| name | type | required | default value |
| ----------------------------- | ---------- | -------- | ------------- |
|rules | ConfigList | yes | - |
|rules.field_name | string | yes | - |
|rules.field_type | string | no | - |
|rules.field_value | ConfigList | no | - |
|rules.field_value.rule_type | string | no | - |
|rules.field_value.rule_value | double | no | - |


### rules
### rules [ConfigList]

Rule definition of user's available data. Each rule represents one field validation.

### field_name
### field_name [string]

field name(string)

### field_type
### field_type [string]

field type (string), e.g. `string,boolean,byte,short,int,long,float,double,char,void,BigInteger,BigDecimal,Instant`

### field_value
### field_value [ConfigList]

A list value rule define the data value validation

### rule_type
### rule_type [string]

The following rules are supported for now
`
NOT_NULL,
MIN,
MAX,
MIN_LENGTH,
MAX_LENGTH
NOT_NULL, // value can't be null
MIN, // define the minimum value of data
MAX, // define the maximum value of data
MIN_LENGTH, // define the minimum string length of a string data
MAX_LENGTH // define the maximum string length of a string data
`

### rule_value
### rule_value [double]

the value related to rule type

Expand All @@ -63,8 +63,9 @@ the value related to rule type
the whole config obey with `hocon` style

```hocon
AssertSink {
rules =

Assert {
rules =
[{
field_name = name
field_type = string
Expand Down
Binary file added docs/en/images/seatunnel_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/en/images/seatunnel_starter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
94 changes: 94 additions & 0 deletions docs/en/new-connector/sink/Assert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Assert

## Description

A flink sink plugin which can assert illegal data by user defined rules

## Options

| name | type | required | default value |
| ----------------------------- | ---------- | -------- | ------------- |
|rules | ConfigList | yes | - |
|rules.field_name | string | yes | - |
|rules.field_type | string | no | - |
|rules.field_value | ConfigList | no | - |
|rules.field_value.rule_type | string | no | - |
|rules.field_value.rule_value | double | no | - |


### rules [ConfigList]

Rule definition of user's available data. Each rule represents one field validation.

### field_name [string]

field name(string)

### field_type [string]

field type (string), e.g. `string,boolean,byte,short,int,long,float,double,char,void,BigInteger,BigDecimal,Instant`

### field_value [ConfigList]

A list value rule define the data value validation

### rule_type [string]

The following rules are supported for now
`
NOT_NULL, // value can't be null
MIN, // define the minimum value of data
MAX, // define the maximum value of data
MIN_LENGTH, // define the minimum string length of a string data
MAX_LENGTH // define the maximum string length of a string data
`

### rule_value [double]

the value related to rule type


## Example
the whole config obey with `hocon` style

```hocon
Assert {
rules =
[{
field_name = name
field_type = string
field_value = [
{
rule_type = NOT_NULL
},
{
rule_type = MIN_LENGTH
rule_value = 3
},
{
rule_type = MAX_LENGTH
rule_value = 5
}
]
},{
field_name = age
field_type = int
field_value = [
{
rule_type = NOT_NULL
},
{
rule_type = MIN
rule_value = 10
},
{
rule_type = MAX
rule_value = 20
}
]
}
]

}

```
104 changes: 104 additions & 0 deletions docs/en/new-connector/sink/Clickhouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Clickhouse

## Description

Used to write data to Clickhouse. Supports Batch and Streaming mode.

:::tip

Write data to Clickhouse can also be done using JDBC

:::

## Options

| name | type | required | default value |
|----------------|--------|----------|---------------|
| host | string | yes | - |
| database | string | yes | - |
| table | string | yes | - |
| username | string | yes | - |
| password | string | yes | - |
| clickhouse.* | string | no | |
| bulk_size | string | no | 20000 |
| split_mode | string | no | false |
| sharding_key | string | no | - |
| common-options | string | no | - |

### host [string]

`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` .

### database [string]

The `ClickHouse` database

### table [string]

The table name

### username [string]

`ClickHouse` user username

### password [string]

`ClickHouse` user password

### clickhouse [string]

In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` .

The way to specify the parameter is to add the prefix `clickhouse.` to the original parameter name. For example, the way to specify `socket_timeout` is: `clickhouse.socket_timeout = 50000` . If these non-essential parameters are not specified, they will use the default values given by `clickhouse-jdbc`.

### bulk_size [number]

The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000` .

### split_mode [boolean]

This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option
should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be
counted.

### sharding_key [string]

When use split_mode, which node to send data to is a problem, the default is random selection, but the
'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only
worked when 'split_mode' is true.

### common options [string]

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details

## Examples

```hocon
sink {

Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
split_mode = true
sharding_key = "age"
}

}
```

```hocon
sink {

Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "default"
password = ""
}

}
```
Loading