Skip to content

Commit

Permalink
[Improve][connector-file] update e2e config
Browse files Browse the repository at this point in the history
  • Loading branch information
liunaijie committed Oct 30, 2023
1 parent f0607bd commit 45e156e
Show file tree
Hide file tree
Showing 25 changed files with 186 additions and 43 deletions.
10 changes: 6 additions & 4 deletions docs/en/connector-v2/source/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and
| secret_key | string | yes | - |
| region | string | yes | - |
| read_columns | list | yes | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| skip_header_row_number | long | no | 0 |
| date_format | string | no | yyyy-MM-dd |
Expand Down Expand Up @@ -133,13 +133,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand Down Expand Up @@ -176,7 +176,9 @@ The region of cos file system.

The read column list of the data source, user can use it to implement field projection.

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down
12 changes: 7 additions & 5 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| password | string | yes | - |
| path | string | yes | - |
| file_format_type | string | yes | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| read_columns | list | no | - |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
Expand Down Expand Up @@ -131,13 +131,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand All @@ -154,7 +154,9 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down Expand Up @@ -253,7 +255,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
name = string
age = int
}
delimiter = "#"
field_delimiter = "#"
}
```
Expand Down
6 changes: 5 additions & 1 deletion docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Read data from hdfs file system.
| fs.defaultFS | string | yes | - | The hadoop cluster address that start with `hdfs://`, for example: `hdfs://hadoopcluster` |
| read_columns | list | yes | - | The read column list of the data source, user can use it to implement field projection.The file type supported column projection as the following shown:[text,json,csv,orc,parquet,excel].Tips: If the user wants to use this feature when reading `text` `json` `csv` files, the schema option must be configured. |
| hdfs_site_path | string | no | - | The path of `hdfs-site.xml`, used to load ha configuration of namenodes |
| field_delimiter | string | no | \001 | Field delimiter, used to tell connector how to slice and dice fields when reading text files. default `\001`, the same as hive's default delimiter |
| delimiter/field_delimiter | string | no | \001 | Field delimiter, used to tell connector how to slice and dice fields when reading text files. default `\001`, the same as hive's default delimiter |
| parse_partition_from_path | boolean | no | true | Control whether parse the partition keys and values from file path. For example if you read a file from path `hdfs://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every record data from file will be added these two fields:[name:tyrantlucifer,age:26].Tips:Do not define partition fields in schema option. |
| date_format | string | no | yyyy-MM-dd | Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` default `yyyy-MM-dd`.Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` default `yyyy-MM-dd` |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss | Datetime type format, used to tell connector how to convert string to datetime, supported as the following formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` .default `yyyy-MM-dd HH:mm:ss` |
Expand All @@ -59,6 +59,10 @@ Read data from hdfs file system.
| compress_codec | string | no | none | The compress codec of files |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details. |

### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

### compress_codec [string]

The compress codec of files and the details that supported as the following shown:
Expand Down
10 changes: 6 additions & 4 deletions docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| path | string | yes | - |
| file_format_type | string | yes | - |
| read_columns | list | no | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
Expand Down Expand Up @@ -127,13 +127,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand All @@ -154,7 +154,9 @@ connector will generate data as the following:

The read column list of the data source, user can use it to implement field projection.

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down
10 changes: 6 additions & 4 deletions docs/en/connector-v2/source/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ It only supports hadoop version **2.9.X+**.
| access_secret | string | yes | - |
| endpoint | string | yes | - |
| read_columns | list | yes | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| skip_header_row_number | long | no | 0 |
| date_format | string | no | yyyy-MM-dd |
Expand Down Expand Up @@ -134,13 +134,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand Down Expand Up @@ -177,7 +177,9 @@ The endpoint of oss file system.

The read column list of the data source, user can use it to implement field projection.

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down
10 changes: 6 additions & 4 deletions docs/en/connector-v2/source/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ It only supports hadoop version **2.9.X+**.
| access_secret | string | yes | - |
| endpoint | string | yes | - |
| read_columns | list | no | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
Expand Down Expand Up @@ -137,13 +137,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand Down Expand Up @@ -180,7 +180,9 @@ The endpoint of oss file system.

The read column list of the data source, user can use it to implement field projection.

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down
10 changes: 7 additions & 3 deletions docs/en/connector-v2/source/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,13 +111,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand Down Expand Up @@ -205,7 +205,7 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| access_key | string | no | - | Only used when `fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider ` |
| access_secret | string | no | - | Only used when `fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider ` |
| hadoop_s3_properties | map | no | - | If you need to add other option, you could add it here and refer to this [link](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html) |
| field_delimiter | string | no | \001 | Field delimiter, used to tell connector how to slice and dice fields when reading text files. Default `\001`, the same as hive's default delimiter. |
| delimiter/field_delimiter | string | no | \001 | Field delimiter, used to tell connector how to slice and dice fields when reading text files. Default `\001`, the same as hive's default delimiter. |
| parse_partition_from_path | boolean | no | true | Control whether parse the partition keys and values from file path. For example if you read a file from path `s3n://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every record data from file will be added these two fields: name="tyrantlucifer", age=16 |
| date_format | string | no | yyyy-MM-dd | Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`. default `yyyy-MM-dd` |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss | Datetime type format, used to tell connector how to convert string to datetime, supported as the following formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` |
Expand All @@ -216,6 +216,10 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| compress_codec | string | no | none |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details. |

### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

### compress_codec [string]

The compress codec of files and the details that supported as the following shown:
Expand Down
10 changes: 6 additions & 4 deletions docs/en/connector-v2/source/SftpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| password | string | yes | - |
| path | string | yes | - |
| file_format_type | string | yes | - |
| field_delimiter | string | no | \001 |
| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| skip_header_row_number | long | no | 0 |
Expand Down Expand Up @@ -132,13 +132,13 @@ If you do not assign data schema connector will treat the upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |

If you assign data schema, you should also assign the option `delimiter` too except CSV file type
If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type

you should assign schema and delimiter as the following:

```hocon
delimiter = "#"
field_delimiter = "#"
schema {
fields {
name = string
Expand All @@ -155,7 +155,9 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |

### field_delimiter [string]
### delimiter/field_delimiter [string]

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

Only need to be configured when file_format is text.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ public class BaseSourceConfig {
Options.key("field_delimiter")
.stringType()
.defaultValue(TextFormatConstant.SEPARATOR[0])
.withFallbackKeys("delimiter")
.withDescription(
"The separator between columns in a row of data. Only needed by `text` file format");

Expand Down
Loading

0 comments on commit 45e156e

Please sign in to comment.