Skip to content

Commit

Permalink
Add seatunnel datatype and convert origin value into seatunnel data t…
Browse files Browse the repository at this point in the history
…ype (#1797)

* Add seatunnel datatype
  • Loading branch information
ruanwenjun authored May 5, 2022
1 parent 4639ba1 commit c340795
Show file tree
Hide file tree
Showing 127 changed files with 1,776 additions and 627 deletions.
108 changes: 108 additions & 0 deletions docs/en/concept/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
sidebar_position: 2
---

# Intro to config file

In SeaTunnel, the most important thing is the Config file, through which users can customize their own data
synchronization requirements to maximize the potential of SeaTunnel. So next, I will introduce you how to
configure the Config file.

## Example

Before you read on, you can find config file
examples [here](https://github.com/apache/incubator-seatunnel/tree/dev/config) and in distribute package's
config directory.

## Config file structure

The Config file will be similar to the one below.

```hocon
env {
execution.parallelism = 1
}
source {
FakeSource {
result_table_name = "fake"
field_name = "name,age"
}
}
transform {
sql {
sql = "select name,age from fake"
}
}
sink {
Clickhouse {
host = "clickhouse:8123"
database = "default"
table = "seatunnel_console"
fields = ["name"]
username = "default"
password = ""
}
}
```

As you can see, the Config file contains several sections: env, source, transform, sink. Different modules
have different functions. After you understand these modules, you will understand how SeaTunnel works.

### env

Used to add some engine optional parameters, no matter which engine (Spark or Flink), the corresponding
optional parameters should be filled in here.

<!-- TODO add supported env parameters -->

### source

source is used to define where SeaTunnel needs to fetch data, and use the fetched data for the next step.
Multiple sources can be defined at the same time. The supported source at now
check [Source of SeaTunnel](../connector/source). Each source has its own specific parameters to define how to
fetch data, and SeaTunnel also extracts the parameters that each source will use, such as
the `result_table_name` parameter, which is used to specify the name of the data generated by the current
source, which is convenient for follow-up used by other modules.

### transform

When we have the data source, we may need to further process the data, so we have the transform module. Of
course, this uses the word 'may', which means that we can also directly treat the transform as non-existent,
directly from source to sink. Like below.

```hocon
transform {
// no thing on here
}
```

Like source, transform has specific parameters that belong to each module. The supported source at now check.
The supported transform at now check [Transform of SeaTunnel](../transform)

### sink

Our purpose with SeaTunnel is to synchronize data from one place to another, so it is critical to define how
and where data is written. With the sink module provided by SeaTunnel, you can complete this operation quickly
and efficiently. Sink and source are very similar, but the difference is reading and writing. So go check out
our [supported sinks](../connector/sink).

### Other

You will find that when multiple sources and multiple sinks are defined, which data is read by each sink, and
which is the data read by each transform? We use `result_table_name` and `source_table_name` two key
configurations. Each source module will be configured with a `result_table_name` to indicate the name of the
data source generated by the data source, and other transform and sink modules can use `source_table_name` to
refer to the corresponding data source name, indicating that I want to read the data for processing. Then
transform, as an intermediate processing module, can use both `result_table_name` and `source_table_name`
configurations at the same time. But you will find that in the above example Config, not every module is
configured with these two parameters, because in SeaTunnel, there is a default convention, if these two
parameters are not configured, then the generated data from the last module of the previous node will be used.
This is much more convenient when there is only one source.

## What's More

If you want to know the details of this format configuration, Please
see [HOCON](https://github.com/lightbend/config/blob/main/HOCON.md).
8 changes: 0 additions & 8 deletions docs/en/connector/config-example.md

This file was deleted.

5 changes: 1 addition & 4 deletions docs/en/faq.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
# FAQ

why-i-should-install-computing-engine-like-spark-or-flink

## Why I should install computing engine like Spark or Flink

<!-- We should add the reason -->
TODO
Now SeaTunnel uses computing engines such as spark and flink to complete resource scheduling and node communication, so we can focus on the ease of use of data synchronization and the development of high-performance components. But this is only temporary.

## I have a question, but I can not solve it by myself

Expand Down
2 changes: 1 addition & 1 deletion docs/en/start/local.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';

Before you getting start the local run, you need to make sure you already have installed the following software which SeaTunnel required:

* [Java](https://www.java.com/en/download/) (only JDK 8 supported by now) installed and `JAVA_HOME` set.
* [Java](https://www.java.com/en/download/) (Java 8 or 11, other versions greater than Java 8 can theoretically work as well) installed and `JAVA_HOME` set.
* Download the engine, you can choose and download one of them from below as your favour, you could see more information about [why we need engine in SeaTunnel](../faq.md#why-i-should-install-computing-engine-like-spark-or-flink)
* Spark: Please [download Spark](https://spark.apache.org/downloads.html) first(**required version >= 2** and version < 3.x). For more information you could
see [Getting Started: standalone](https://spark.apache.org/docs/latest/spark-standalone.html#installing-spark-standalone-to-a-cluster)
Expand Down
10 changes: 5 additions & 5 deletions docs/en/transform/replace.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Json
# Replace

## Description

Expand Down Expand Up @@ -33,13 +33,13 @@ The name of the field to replaced.

The string to match.

### is_regex [string]
### replacement [string]

Whether or not to interpret the pattern as a regex (true) or string literal (false).
The replacement pattern (is_regex is true) or string literal (is_regex is false).

### replacement [boolean]
### is_regex [boolean]

The replacement pattern (is_regex is true) or string literal (is_regex is false).
Whether or not to interpret the pattern as a regex (true) or string literal (false).

### replace_first [boolean]

Expand Down
62 changes: 62 additions & 0 deletions docs/en/transform/uuid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# UUID

## Description

Generate a universally unique identifier on a specified field.

:::tip

This transform **ONLY** supported by Spark.

:::

## Options

| name | type | required | default value |
| -------------- | ------ | -------- | ------------- |
| fields | string | yes | - |
| prefix | string | no | - |
| secure | boolean| no | false |

### field [string]

The name of the field to generate.

### prefix [string]

The prefix string constant to prepend to each generated UUID.

### secure [boolean]

the cryptographically secure algorithm can be comparatively slow
The nonSecure algorithm uses a secure random seed but is otherwise deterministic

### common options [string]

Transform plugin common parameters, please refer to [Transform Plugin](common-options.mdx) for details

## Examples

```bash
UUID {
fields = "u"
prefix = "uuid-"
secure = true
}
}
```

Use `UUID` as udf in sql.

```bash
UUID {
fields = "u"
prefix = "uuid-"
secure = true
}

# Use the uuid function (confirm that the fake table exists)
sql {
sql = "select * from (select raw_message, UUID() as info_row from fake) t1"
}
```
10 changes: 8 additions & 2 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -68,14 +68,20 @@ const sidebars = {
items: [
'start/local',
'start/docker',
'start/kubernetes',
'start/kubernetes'
],
},
{
type: 'category',
label: 'Concept',
items: [
'concept/config',
],
},
{
type: 'category',
label: 'Connector',
items: [
'connector/config-example',
{
type: 'category',
label: 'Source',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,9 @@

package org.apache.seatunnel.api.table.catalog;

/**
* Interface for reading and writing table metadata from SeaTunnel. Each connector need to contain
* the implementation of Catalog.
*/
public interface Catalog {
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,26 +21,41 @@
import java.util.List;
import java.util.Map;

/**
* Represent the table metadata in SeaTunnel.
*/
public final class CatalogTable implements Serializable {

private static final long serialVersionUID = 1L;

/**
* Used to identify the table.
*/
private final TableIdentifier tableId;

/**
* The table schema metadata.
*/
private final TableSchema tableSchema;

private final Map<String, String> options;

private final List<String> partitionKeys;

private final String comment;

public static CatalogTable of(
TableIdentifier tableId,
TableSchema tableSchema,
Map<String, String> options,
List<String> partitionKeys,
String comment) {
TableIdentifier tableId,
TableSchema tableSchema,
Map<String, String> options,
List<String> partitionKeys,
String comment) {
return new CatalogTable(
tableId,
tableSchema,
options,
partitionKeys,
comment);
tableId,
tableSchema,
options,
partitionKeys,
comment);
}

private CatalogTable(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,14 @@
@SuppressWarnings("PMD.AbstractClassShouldStartWithAbstractNamingRule")
public abstract class Column {

/**
* column name.
*/
protected final String name;

/**
* Data type of the column.
*/
protected final DataType dataType;

protected final String comment;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
import java.io.Serializable;
import java.util.List;

/**
* Represent a physical table schema.
*/
public final class TableSchema implements Serializable {
private static final long serialVersionUID = 1L;
private final List<Column> columns;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package org.apache.seatunnel.api.table.type;

public class BasicType<T> implements DataType<T> {

private final Class<T> typeClass;

public BasicType(Class<T> typeClass) {
if (typeClass == null) {
throw new IllegalArgumentException("typeClass cannot be null");
}
this.typeClass = typeClass;
}

@Override
public boolean isBasicType() {
return true;
}

@Override
public Class<T> getTypeClass() {
return this.typeClass;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.seatunnel.api.table.type;

public class BooleanType extends BasicType<Boolean> {
private static final BooleanType INSTANCE = new BooleanType(Boolean.class);

private BooleanType(Class<Boolean> typeClass) {
super(typeClass);
}
}
Loading

0 comments on commit c340795

Please sign in to comment.