GitHub - skale-me/skale: High performance distributed data processing engine

Development activity is stopped, and this project is now archived.

High performance distributed data processing and machine learning.

Skale provides a high-level API in Javascript and an optimized parallel execution engine on top of NodeJS.

Features

Pure javascript implementation of a Spark like engine
Multiple data sources: filesystems, databases, cloud (S3, azure)
Multiple data formats: CSV, JSON, Columnar (Parquet)...
50 high level operators to build parallel apps
Machine learning: scalable classification, regression, clusterization
Run interactively in a nodeJS REPL shell
Docker ready, simple local mode or full distributed mode
Very fast, see benchmark

Quickstart

npm install skale

Word count example:

var sc = require('skale').context();

sc.textFile('/my/path/*.txt')
  .flatMap(line => line.split(' '))
  .map(word => [word, 1])
  .reduceByKey((a, b) => a + b, 0)
  .count(function (err, result) {
    console.log(result);
    sc.end();
  });

Local mode

In local mode, worker processes are automatically forked and communicate with app through child process IPC channel. This is the simplest way to operate, and it allows to use all machine available cores.

To run in local mode, just execute your app script:

node my_app.js

or with debug traces:

SKALE_DEBUG=2 node my_app.js

Distributed mode

In distributed mode, a cluster server process and worker processes must be started prior to start app. Processes communicate with each other via raw TCP or via websockets.

To run in distributed cluster mode, first start a cluster server on server_host:

./bin/server.js

On each worker host, start a worker controller process which connects to server:

./bin/worker.js -H server_host

Then run your app, setting the cluster server host in environment:

SKALE_HOST=server_host node my_app.js

The same with debug traces:

SKALE_HOST=server_host SKALE_DEBUG=2 node my_app.js

Resources

Contributing guide
Documentation
Gitter for support and discussion
Mailing list for discussion about use and development

Authors

The original authors of skale are Cedric Artigue and Marc Vertes.

List of all contributors

License

Apache-2.0

Credits

Logo Icon made by Smashicons from www.flaticon.com is licensed by CC 3.0 BY

Name		Name	Last commit message	Last commit date
Latest commit History 1,718 Commits
benchmark		benchmark
bin		bin
docker		docker
docs		docs
examples		examples
lib		lib
ml		ml
test		test
.eslintignore		.eslintignore
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Roadmap.md		Roadmap.md
appveyor.yml		appveyor.yml
index.js		index.js
mkdocs.yml		mkdocs.yml
package-lock.json		package-lock.json
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Quickstart

Local mode

Distributed mode

Resources

Authors

License

Credits

About

Releases 6

Packages

Contributors 7

Languages

License

skale-me/skale

Folders and files

Latest commit

History

Repository files navigation

Features

Quickstart

Local mode

Distributed mode

Resources

Authors

License

Credits

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 7

Languages

Packages