[Improve][connector-file] update e2e config

apache · Oct 30, 2023 · 45e156e · 45e156e
1 parent f0607bd
commit 45e156e
Show file tree

Hide file tree

Showing 25 changed files with 186 additions and 43 deletions.
diff --git a/docs/en/connector-v2/source/CosFile.md b/docs/en/connector-v2/source/CosFile.md
@@ -52,7 +52,7 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and
 | secret_key                | string  | yes      | -                   |
 | region                    | string  | yes      | -                   |
 | read_columns              | list    | yes      | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | parse_partition_from_path | boolean | no       | true                |
 | skip_header_row_number    | long    | no       | 0                   |
 | date_format               | string  | no       | yyyy-MM-dd          |
@@ -133,13 +133,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -176,7 +176,9 @@ The region of cos file system.
 
 The read column list of the data source, user can use it to implement field projection.
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 

diff --git a/docs/en/connector-v2/source/FtpFile.md b/docs/en/connector-v2/source/FtpFile.md
@@ -44,7 +44,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
 | password                  | string  | yes      | -                   |
 | path                      | string  | yes      | -                   |
 | file_format_type          | string  | yes      | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | read_columns              | list    | no       | -                   |
 | parse_partition_from_path | boolean | no       | true                |
 | date_format               | string  | no       | yyyy-MM-dd          |
@@ -131,13 +131,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -154,7 +154,9 @@ connector will generate data as the following:
 |---------------|-----|--------|
 | tyrantlucifer | 26  | male   |
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 
@@ -253,7 +255,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
       name = string
       age = int
     }
-    delimiter = "#"
+    field_delimiter = "#"
   }
 
 ```

diff --git a/docs/en/connector-v2/source/HdfsFile.md b/docs/en/connector-v2/source/HdfsFile.md
@@ -46,7 +46,7 @@ Read data from hdfs file system.
 | fs.defaultFS              | string  | yes      | -                   | The hadoop cluster address that start with `hdfs://`, for example: `hdfs://hadoopcluster`                                                                                                                                                                                                                                                     |
 | read_columns              | list    | yes      | -                   | The read column list of the data source, user can use it to implement field projection.The file type supported column projection as the following shown:[text,json,csv,orc,parquet,excel].Tips: If the user wants to use this feature when reading `text` `json` `csv` files, the schema option must be configured.                           |
 | hdfs_site_path            | string  | no       | -                   | The path of `hdfs-site.xml`, used to load ha configuration of namenodes                                                                                                                                                                                                                                                                       |
-| field_delimiter           | string  | no       | \001                | Field delimiter, used to tell connector how to slice and dice fields when reading text files. default `\001`, the same as hive's default delimiter                                                                                                                                                                                            |
+| delimiter/field_delimiter | string  | no       | \001                | Field delimiter, used to tell connector how to slice and dice fields when reading text files. default `\001`, the same as hive's default delimiter                                                                                                                                                                                            |
 | parse_partition_from_path | boolean | no       | true                | Control whether parse the partition keys and values from file path. For example if you read a file from path `hdfs://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every record data from file will be added these two fields:[name:tyrantlucifer,age:26].Tips:Do not define partition fields in schema option.            |
 | date_format               | string  | no       | yyyy-MM-dd          | Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` default `yyyy-MM-dd`.Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` default `yyyy-MM-dd` |
 | datetime_format           | string  | no       | yyyy-MM-dd HH:mm:ss | Datetime type format, used to tell connector how to convert string to datetime, supported as the following formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` .default `yyyy-MM-dd HH:mm:ss`                                                                                                          |
@@ -59,6 +59,10 @@ Read data from hdfs file system.
 | compress_codec            | string  | no       | none                | The compress codec of files                                                                                                                                                                                                                                                                                                                   |
 | common-options            |         | no       | -                   | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details.                                                                                                                                                                                                                                      |
 
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following shown:

diff --git a/docs/en/connector-v2/source/LocalFile.md b/docs/en/connector-v2/source/LocalFile.md
@@ -46,7 +46,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
 | path                      | string  | yes      | -                   |
 | file_format_type          | string  | yes      | -                   |
 | read_columns              | list    | no       | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | parse_partition_from_path | boolean | no       | true                |
 | date_format               | string  | no       | yyyy-MM-dd          |
 | datetime_format           | string  | no       | yyyy-MM-dd HH:mm:ss |
@@ -127,13 +127,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -154,7 +154,9 @@ connector will generate data as the following:
 
 The read column list of the data source, user can use it to implement field projection.
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 

diff --git a/docs/en/connector-v2/source/OssFile.md b/docs/en/connector-v2/source/OssFile.md
@@ -53,7 +53,7 @@ It only supports hadoop version **2.9.X+**.
 | access_secret             | string  | yes      | -                   |
 | endpoint                  | string  | yes      | -                   |
 | read_columns              | list    | yes      | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | parse_partition_from_path | boolean | no       | true                |
 | skip_header_row_number    | long    | no       | 0                   |
 | date_format               | string  | no       | yyyy-MM-dd          |
@@ -134,13 +134,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -177,7 +177,9 @@ The endpoint of oss file system.
 
 The read column list of the data source, user can use it to implement field projection.
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 

diff --git a/docs/en/connector-v2/source/OssJindoFile.md b/docs/en/connector-v2/source/OssJindoFile.md
@@ -56,7 +56,7 @@ It only supports hadoop version **2.9.X+**.
 | access_secret             | string  | yes      | -                   |
 | endpoint                  | string  | yes      | -                   |
 | read_columns              | list    | no       | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | parse_partition_from_path | boolean | no       | true                |
 | date_format               | string  | no       | yyyy-MM-dd          |
 | datetime_format           | string  | no       | yyyy-MM-dd HH:mm:ss |
@@ -137,13 +137,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -180,7 +180,9 @@ The endpoint of oss file system.
 
 The read column list of the data source, user can use it to implement field projection.
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 

diff --git a/docs/en/connector-v2/source/S3File.md b/docs/en/connector-v2/source/S3File.md
@@ -111,13 +111,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -205,7 +205,7 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
 | access_key                      | string  | no       | -                                                     | Only used when `fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider `                                                                                                                                                                                                                                                                                                  |
 | access_secret                   | string  | no       | -                                                     | Only used when `fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider `                                                                                                                                                                                                                                                                                                  |
 | hadoop_s3_properties            | map     | no       | -                                                     | If you need to add other option, you could add it here and refer to this [link](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)                                                                                                                                                                                                                                              |
-| field_delimiter                 | string  | no       | \001                                                  | Field delimiter, used to tell connector how to slice and dice fields when reading text files. Default `\001`, the same as hive's default delimiter.                                                                                                                                                                                                                                                        |
+| delimiter/field_delimiter       | string  | no       | \001                                                  | Field delimiter, used to tell connector how to slice and dice fields when reading text files. Default `\001`, the same as hive's default delimiter.                                                                                                                                                                                                                                                        |
 | parse_partition_from_path       | boolean | no       | true                                                  | Control whether parse the partition keys and values from file path. For example if you read a file from path `s3n://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every record data from file will be added these two fields: name="tyrantlucifer", age=16                                                                                                                              |
 | date_format                     | string  | no       | yyyy-MM-dd                                            | Date type format, used to tell connector how to convert string to date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`. default `yyyy-MM-dd`                                                                                                                                                                                                                                    |
 | datetime_format                 | string  | no       | yyyy-MM-dd HH:mm:ss                                   | Datetime type format, used to tell connector how to convert string to datetime, supported as the following formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss`                                                                                                                                                                                                      |
@@ -216,6 +216,10 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
 | compress_codec                  | string  | no       | none                                                  |
 | common-options                  |         | no       | -                                                     | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details.                                                                                                                                                                                                                                                                                                   |
 
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following shown:

diff --git a/docs/en/connector-v2/source/SftpFile.md b/docs/en/connector-v2/source/SftpFile.md
@@ -44,7 +44,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
 | password                  | string  | yes      | -                   |
 | path                      | string  | yes      | -                   |
 | file_format_type          | string  | yes      | -                   |
-| field_delimiter           | string  | no       | \001                |
+| delimiter/field_delimiter | string  | no       | \001                |
 | parse_partition_from_path | boolean | no       | true                |
 | date_format               | string  | no       | yyyy-MM-dd          |
 | skip_header_row_number    | long    | no       | 0                   |
@@ -132,13 +132,13 @@ If you do not assign data schema connector will treat the upstream data as the f
 |-----------------------|
 | tyrantlucifer#26#male |
 
-If you assign data schema, you should also assign the option `delimiter` too except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter` too except CSV file type
 
 you should assign schema and delimiter as the following:
 
 ```hocon
 
-delimiter = "#"
+field_delimiter = "#"
 schema {
     fields {
         name = string
@@ -155,7 +155,9 @@ connector will generate data as the following:
 |---------------|-----|--------|
 | tyrantlucifer | 26  | male   |
 
-### field_delimiter [string]
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.
 
 Only need to be configured when file_format is text.
 

diff --git a/...src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java b/...src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java
@@ -44,6 +44,7 @@ public class BaseSourceConfig {
             Options.key("field_delimiter")
                     .stringType()
                     .defaultValue(TextFormatConstant.SEPARATOR[0])
+                    .withFallbackKeys("delimiter")
                     .withDescription(
                             "The separator between columns in a row of data. Only needed by `text` file format");