Transporter is an open-source data transferring tool developed by Trendyol with love .
We developed this tool because we needed to archive our high throughput tables. In other words; we need to move our table data from one database to another. Some of our tables have a daily throughput of more than 1 million. We use these tables constantly. To achieve high performance, we need to archive the data we do not need actively. Some data is stale after a month, some after 6 months. There are a lot of tools out there that do just that, but they are not very flexible or extensible. So we developed our own tool, which has proven to be very useful, and we decided to share it with the community.
Transporter has two important concepts; Source and Target. As inferred by names our goal is to move data from Source to Target. Each transporting operation corresponds to a json object. Using Quartz from each json object a cron job is produced.
There are two jobs that run in parallel; Polling Job and Transfer Job.
Polling Job Steps
- Select IDs from source table according to given conditions
- Write selected IDs to interim table
Transfer Job Steps
- Select IDs from the interim table.
- Select data from the source table that corresponds to selected IDs.
- Write data to the target.
- Delete transferred data from the source table.
- Delete transferred data from the interim respectively.
In database processes, data safety is our first concern. Since we delete data only after it is written to target, we make sure that even if Transporter fails in any of the transferring steps, data would still be protected.
-
Moves data from source database to target database
-
Performs transporting on given time intervals
-
Dockerizability: Can work in a container.
-
Extensibility: Adding support for a new database type is easy as it requires writing only one adapter.
-
Configurability: The user can specify many settings such as the “where” clause, cron job interval, excluded columns, and batch quantity.
-
Scalability: The project can arrange the pod number according to throughput since it has docker support.
-
Interoperability: The project can transfer data mutually between different database types
Couchbase | MSSQL | |
---|---|---|
Couchbase | ✓ | ✓ |
MSSQL | ✓ | ✓ |
- Schedulability: Can work at the given interval.
Explanation of some main properties of config properties. Please note that the config properties are case sensitive. Example can be found in “examples/configs” folder.
- Name: Must be unique.
- Type: Specifies database type. You can give "Couchbase" or "mssql" for now.
- Cron: Specifies the interval that transporter will work on. For example: "0/20 * * ? * *"
- Condition: Condition that specifies which data is to be transferred
- KeyProperty: When transferring data from Mssql to Couchbase you can select the key of Couchbase that corresponds to any MSSQL column, not just Id property. If you are transferring data from Couchbase to Couchbase you must use "id" like in the examples.
For Couchbase: When generating a key we use id and name of data source. Example: {id}_{dataSourceName} For mssql: Unique Index must be created for id and data source name on interim table.
To use Transporter, there are config examples in "examples/configs" folder. Transporter is a dockerized project, so you can deploy it to a Kubernetes cluster if you want. There are two string config keys that should be given to Transporter, "PollingJobSettings" and "TransferJobSettings" as strings. You can use online sites like JSON Online Converter to convert your JSON file to string.
See the open issues for a list of proposed features (and known issues).
Currently project is not open to contributions.
Distributed under the MIT License. See LICENSE
for more information.
Fatiha Beqirovski Polattimur - Github - fatiha.beqirovski@trendyol.com
Mehmet Fırat Kömürcü - Github - firat.komurcu@trendyol.com