14 Jun 19:30

chathurawidanage

885a631

Twister2 Release 0.2.2

This is a patch release of Twister2 with first versions of few major features.

You can download source code from Github

First version of major features

Streaming windowing support
Join operations included in Task API
Run OpenMPI programs inside task graph
Checkpointing for streaming and batch applications

Minor features

Apart from these, we have done API refactorings and many improvements to performance

Next Release

We are working on to consolidate the features introduced in this release. Also we are continuing to
improve the code, fix bugs etc.

Components in Twister2

We support the following components in Twister2

Resource provisioning component to bring up and manage parallel workers in cluster environments
1. Standalone
2. Kubernetes
3. Mesos
4. Slurm
5. Nomad
Parallel and Distributed Operators in HPC and Cloud Environments
1. Twister2:Net - a data level dataflow operator library for streaming and large scale batch analysis
2. Harp - a BSP (Bulk Synchronous Processing) innovative collective framework for parallel applications and machine learning at message level
3. OpenMPI (HPC Environments only) at message level
Task System
1. Task Graph
  - Create dataflow graphs for streaming and batch analysis including iterative computations
2. Task Scheduler - Schedule the task graph into cluster resources supporting different scheduling algorithms
  - Datalocality Scheduling
  - Roundrobin scheduling
  - First fit scheduling
3. Executor - Execution of task graph
  - Batch executor
  - Streaming executor
TSet for distributed data representation (Similar to Spark RDD, Flink DataSet and Heron Streamlet)
1. Iterative computations
2. Data caching
APIs for streaming and batch applications
1. Operator API
2. Task Graph based API
3. TSet API
Support for storage systems
1. HDFS
2. Local file systems
3. NFS for persistent storage
Web UI for monitoring Twister2 Jobs
Apache Storm Compatibility API
Connected DataFlow (Experimental)
1. Supports creation of multiple dataflow graphs executing in a single job

Assets 2

10 May 18:14

chathurawidanage

0.2.1

7b308a2

Twister2 Release 0.2.1

Twister2 0.2.1 is a patch release of Twister2 where we improve its performance and bugs.

We have add Streaming windowing support as a new beta feature to this release.

You can download source code from Github

Major Features

This release includes the core components of realizing the above goals.

Resource provisioning component to bring up and manage parallel workers in cluster environments
1. Standalone
2. Kubernetes
3. Mesos
4. Slurm
5. Nomad
Parallel and Distributed Operators in HPC and Cloud Environments
1. Twister2:Net - a data level dataflow operator library for streaming and large scale batch analysis
2. Harp - a BSP (Bulk Synchronous Processing) innovative collective framework for parallel applications and machine learning at message level
3. OpenMPI (HPC Environments only) at message level
Task System
1. Task Graph
  - Create dataflow graphs for streaming and batch analysis including iterative computations
2. Task Scheduler - Schedule the task graph into cluster resources supporting different scheduling algorithms
  - Datalocality Scheduling
  - Roundrobin scheduling
  - First fit scheduling
3. Executor - Execution of task graph
  - Batch executor
  - Streaming executor
TSet for distributed data representation (Similar to Spark RDD, Flink DataSet and Heron Streamlet)
1. Iterative computations
2. Data caching
APIs for streaming and batch applications
1. Operator API
2. Task Graph based API
3. TSet API
Support for storage systems
1. HDFS
2. Local file systems
3. NFS for persistent storage
Web UI for monitoring Twister2 Jobs
Apache Storm Compatibility API
Connected DataFlow (Experimental)
1. Supports creation of multiple dataflow graphs executing in a single job

These features translates to running following types of applications natively with high performance.

Streaming computations
Data operations in batch mode
Iterative computations

Examples

With this release we include several examples to demonstrate various features of Twister2.

A Hello World example
Communication examples - how to use communications for streaming and batch
Task examples - how to create task graphs with different operators for streaming and batch
K-Means
Sorting of records
Word count
Iterative examples
Harp example
SVM

Road map

We have started working on our next major release that will connect the core components we have developed
into a full data analytics environment. In particular it will focus on providing APIs around the core
capabilities of Twister2 and integration of applications in a single dataflow.

Next Major Release (End of June 2019)

Connected DataFlow
Fault tolerance
Supporting more API's including Beam
More example applications

Beyond next release

Python API
Implementing core parts of Twister2 with C/C++ for high performance
Direct use of RDMA
SQL interface
Native MPI support for cloud deployments
More resource managers - Pilot Jobs, Yarn

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Assets 2

28 Mar 20:54

chathurawidanage

v0.2.0

506bde4

Twister2 Release 0.2.0

Twister2 0.2.0 is the second open source public release of Twister2. We are excited to bring another release of our
high performance data analytics hosting environment that can work in both cloud and HPC environments.

You can download source code from Github

Major Features

This release includes the core components of realizing the above goals.

Resource provisioning component to bring up and manage parallel workers in cluster environments
1. Standalone
2. Kubernetes
3. Mesos
4. Slurm
5. Nomad
Parallel and Distributed Communications in HPC and Cloud Environments
1. Twister2:Net - a data level dataflow communication library for streaming and large scale batch analysis
2. Harp - a BSP (Bulk Synchronous Processing) innovative collective framework for parallel applications and machine learning at message level
3. OpenMPI (HPC Environments only) at message level
Task System
1. Task Graph
  - Create dataflow graphs for streaming and batch analysis including iterative computations
2. Task Scheduler - Schedule the task graph into cluster resources supporting different scheduling algorithms
  - Datalocality Scheduling
  - Roundrobin scheduling
  - First fit scheduling
3. Executor - Execution of task graph
  - Batch executor
  - Streaming executor
API for creating Task Graph and Communication
1. Communication API
2. Task based API
3. Data API (TSet API)
Support for storage systems
1. HDFS
2. Local file systems
3. NFS for persistent storage
Web UI for monitoring Twister2 Jobs
Apache Storm Compatibility API

These features translates to running following types of applications natively with high performance.

Streaming computations
Data operations in batch mode
Iterative computations

Examples

With this release we include several examples to demonstrate various features of Twister2.

A Hello World example
Communication examples - how to use communications for streaming and batch
Task examples - how to create task graphs with different operators for streaming and batch
K-Means
Sorting of records
Word count
Iterative examples
Harp example
SVM

Road map

Next Major Release (End of June 2019)

Connected DataFlow
Fault tolerance
Supporting more API's including Beam
Python API
More resource managers - Pilot Jobs, Yarn
More example applications

Beyond next release

Implementing core parts of Twister2 with C/C++ for high performance
Direct use of RDMA
SQL interface
Native MPI support for cloud deployments

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Assets 2

05 Oct 20:30

chathurawidanage

v0.1.0

9cddbee

Twister2 v0.1.0 Pre-release

Pre-release

Twister2 Release 0.1.0

Twister2 0.1.0 is the first open source public release of Twister2. We are excited to bring a high performance data analytics
hosting environment that can work in both cloud and HPC environments. This is the first step towards
building a complete end to end high performance solution for data analytics ranging from streaming to batch analysis to
machine learning applications. Our vision is to make the system work seamlessly both in cloud and HPC environments ranging from single machines to large clusters.

You can download source code from Github

Major Features

This release includes the core components of realizing the above goals.

Resource provisioning component to bring up and manage parallel workers in cluster environments
1. Standalone
2. Kubernetes
3. Mesos
4. Slurm
5. Nomad
Parallel and Distributed Communications in HPC and Cloud Environments
1. Twister2:Net - a data level dataflow communication library for streaming and large scale batch analysis
2. Harp - a BSP (Bulk Synchronous Processing) innovative collective framework for parallel applications and machine learning at message level
3. OpenMPI (HPC Environments only) at message level
Task Graph - Create dataflow graphs for streaming and batch analysis including iterative computations
Task Scheduler - Schedule the task graph into cluster resources supporting different scheduling algorithms
1. Datalocality Scheduling
2. Roundrobin scheduling
3. First fit scheduling
Executor - Execution of task graph
1. Batch executor
2. Streaming executor
API for creating Task Graph and Communication
1. Communication API
2. Task based API
Support for storage systems
1. HDFS
2. Local file systems
3. NFS for persistent storage

These features translates to running following types of applications natively with high performance.

Streaming computations
Data operations in batch mode
Iterative computations

Examples

With this release we include several examples to demonstrate various features of Twister2.

A Hello World example
Communication examples - how to use communications for streaming and batch
Task examples - how to create task graphs with different operators for streaming and batch
K-Means
Sorting of records
Word count
Iterative examples
Harp example

Road map

Next release (End of December 2018)

Hierarchical task scheduling - Ability to run different types of jobs in a single dataflow
Fault tolerance
Data API including DataSet similar to Spark RDD, Flink DataSet and Heron Streamlet
Supporting different API's including Storm, Spark, Beam
Heterogeneous resources allocations
Web UI for monitoring Twister2 Jobs
More resource managers - Pilot Jobs, Yarn
More example applications

Beyond next release

Implementing core parts of Twister2 with C/C++ for high performance
Python APIs
Direct use of RDMA
FaaS APIs
SQL interface
Native MPI support for cloud deployements

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Twister2 Release 0.2.2

First version of major features

Minor features

Next Release

Components in Twister2

Twister2 Release 0.2.1

Major Features

Examples

Road map

Next Major Release (End of June 2019)

Beyond next release

License

Twister2 Release 0.2.0

Major Features

Examples

Road map

Next Major Release (End of June 2019)

Beyond next release

License

Twister2 Release 0.1.0

Major Features

Examples

Road map

Next release (End of December 2018)

Beyond next release

License

Releases: cylondata/twister2

Twister2 Release 0.2.2

Twister2 Release 0.2.2

First version of major features

Minor features

Next Release

Components in Twister2

Twister2 Release 0.2.1

Twister2 Release 0.2.1

Major Features

Examples

Road map

Next Major Release (End of June 2019)

Beyond next release

License

Twister2 Release 0.2.0

Twister2 Release 0.2.0

Major Features

Examples

Road map

Next Major Release (End of June 2019)

Beyond next release

License

Twister2 v0.1.0

Twister2 Release 0.1.0

Major Features

Examples

Road map

Next release (End of December 2018)

Beyond next release

License