Skip to content

Commit

Permalink
Update Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
wgzhao committed Dec 1, 2020
1 parent 457d768 commit c66bff7
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 157 deletions.
77 changes: 46 additions & 31 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,39 @@
# Changelog

## 3.1.1
## 3.1.5

### General
### General Changes

* Transformer add column's basic operation
* Use prestosql's hadoop and hive jars instead of apache's
* various misc codes optimize

### dbffilereader
* Various code clean

* remove supported for reading compressed dbf file

### jsonreader

* fixed parse non-string type value

### dbffilewriter

* fixed boolean type writing error

### hdfswriter
### DBF reader

* Use keyword `parquest` indicates support parquet format, old keyword `par` is not used
* Reconstruct this plugin with 3rd-party jar package
* Add support for `Date` type
* Fix for the occasional null pointer exception

### DBF writer

* Add support for `Date` type

## 3.1.4

This is an emergency fix version to fix a serious problem in a previous release ( [\#62](https://github.com/wgzhao/DataX/issues/62)).

## 3.1.3

### Redis reader

* Delete temporary local file
* Only parse redis `String` data type, other types will be ignore

### HDFS reader

* Add support for reading Parquet file (#54)

## 3.1.2

### General
### General Changes

* Does not parse the `-m` command line argument, it doesn't really do anything!

Expand All @@ -47,19 +54,27 @@

* Add support for `json` data type

## 3.1.3

### Redis reader

* Delete temporary local file
* Only parse redis `String` data type, other types will be ignore

### HDFS reader
## 3.1.1

* Add support for reading Parquet file (#54)
### General Changes

## 3.1.4
* Transformer add column's basic operation
* Use prestosql's hadoop and hive jars instead of apache's
* various misc codes optimize

### dbffilereader

This is an emergency fix version to fix a serious problem in a previous release ( [\#62](https://github.com/wgzhao/DataX/issues/62)).
* remove supported for reading compressed dbf file

### jsonreader

* fixed parse non-string type value

### dbffilewriter

* fixed boolean type writing error

### hdfswriter

* Use keyword `parquest` indicates support parquet format, old keyword `par` is not used

112 changes: 53 additions & 59 deletions docs/src/main/sphinx/reader/dbffilereader.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,76 +8,70 @@

```json
{
"job": {
"job": {
"setting": {
"speed": {
"channel": 2
}
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "dbffilereader",
"parameter": {
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "string"
},
{
"value": "201908",
"type": "string"
},
{
"value": "dbf",
"type": "string"
}
],
"path": ["/tmp/test.dbf"],
"encoding": "GBK"
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": "true"
}
}
}
{
"reader": {
"name": "dbffilereader",
"parameter": {
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "long"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "boolean"
},
{
"index": 4,
"type": "string"
},
{
"value": "dbf",
"type": "string"
}
],
"path": [ "/tmp/out"],
"encoding": "GBK"
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": "true"
}
}
}
]
}
}
}
```

## 参数说明

`parameter` 配置项支持以下配置

| 配置项 | 是否必须 | 默认值 | 描述 |
| :--------------- | :------: | ------------ |-------------|
| path ||| DBF文件路径,支持写多个路径,详细情况见下 |
| column || 类型默认为String | 所配置的表中需要同步的列集合, 是 `{type: value}``{type: index}` 的集合,详细配置见下 |
| compress ||| 文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、gzip、bzip2 |
| encoding || UTF-8 | DBF文件编码,比如 `GBK`, `UTF-8` |
| nullFormat || `\N` | 定义哪个字符串可以表示为null, |
| dbversion ||| 指定DBF文件版本,不指定则自动猜测 |

| 配置项 | 是否必须 | 默认值 | 描述 |
| :----------| :------: | ------------ |-------------|
| path ||| DBF文件路径,支持写多个路径,详细情况见下 |
| column || 类型默认为String | 所配置的表中需要同步的列集合, 是 `{type: value}``{type: index}` 的集合,详细配置见下 |
| encoding || GBK | DBF文件编码,比如 `GBK`, `UTF-8` |
| nullFormat || `\N` | 定义哪个字符串可以表示为null, |

### path

描述:本地文件系统的路径信息,注意这里可以支持填写多个路径。
Expand Down
130 changes: 63 additions & 67 deletions docs/src/main/sphinx/writer/dbffilewriter.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,90 +6,88 @@ DbfFileWriter提供了向本地文件写入类dbf格式的一个或者多个表

写入本地文件内容存放的是一张dbf表,例如dbf格式的文件信息。

## 2 功能与限制
## 2 功能说明

件实现了从DataX协议转为本地dbf文件功能,本地文件本身是结构化数据存储,DbfFileWriter如下几个方面约定:

1. 支持且仅支持写入dbf的文件。

2. 支持文本压缩,现有压缩格式为gzip、bzip2。

3. 支持多线程写入,每个线程写入不同子文件。

我们不能做到:

1. 单个文件不能支持并发写入。

## 3 功能说明

### 3.1 配置样例
### 2.1 配置样例

```json
{
"job": {
"setting": {
"speed": {
"batchSize": 20480,
"bytes": -1,
"channel": 1
"job": {
"setting": {
"speed": {
"batchSize": 20480,
"bytes": -1,
"channel": 1
}
},
"content": [{
"reader": {
"name": "streamreader",
"parameter": {
"column" : [
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"value": "DataX",
"type": "string"
"value": "DataX",
"type": "string"
},
{
"value": 19880808,
"type": "long"
"value": 19880808,
"type": "long"
},
{
"value": "1988-08-08 16:00:04",
"type": "date"
"value": "1989-06-04 00:00:00",
"type": "date"
},
{
"value": true,
"type": "bool"
"value": true,
"type": "bool"
},
{
"value":"中文测试",
"type": "string"
}
],
"sliceRecordCount": 1000
],
"sliceRecordCount": 10
}
},
"writer": {
"name": "dbffilewriter",
"parameter": {
"column": [
{
"name": "col1",
"type": "char",
"length": 100
},
{
"name":"col2",
"type":"numeric",
},
"writer": {
"name": "dbffilewriter",
"parameter": {
"column": [
{
"name": "col1",
"type": "char",
"length": 100
},
{
"name": "col2",
"type": "numeric",
"length": 18,
"scale": 0
},
{
"name": "col3",
"type": "date"
},
{
"name":"col4",
"type":"logical"
}
],
},
{
"name": "col3",
"type": "date"
},
{
"name": "col4",
"type": "logical"
},
{
"name": "col5",
"type":"char",
"length": 100
}
],
"fileName": "test.dbf",
"path": "/tmp/out",
"writeMode": "truncate"
}
"path": "/tmp/out",
"writeMode": "truncate",
"encoding": "GBK"
}
}
}
]}
]
}
}
```

Expand All @@ -101,11 +99,9 @@ DbfFileWriter提供了向本地文件写入类dbf格式的一个或者多个表
| column || 类型默认为String | 所配置的表中需要同步的列集合, 是 `{type: value}``{type: index}` 的集合 |
| fileName ||| DbfFileWriter写入的文件名 |
| writeMode ||| DbfFileWriter写入前数据清理处理模式,支持 `truncate`, `append`, `nonConflict` 三种模式,详见如下 |
| compress ||| 文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、gzip、bzip2 |
| encoding || UTF-8 | DBF文件编码,比如 `GBK`, `UTF-8` |
| nullFormat || `\N` | 定义哪个字符串可以表示为null, |
| dateFormat ||| 日期类型的数据序列化到文件中时的格式,例如 `"dateFormat": "yyyy-MM-dd"` |
| fileFormat ||| 文件写出的格式,暂时只支持DBASE III |

#### writeMode

Expand Down

0 comments on commit c66bff7

Please sign in to comment.