Spark-HPCC

Spark classes for working with HPCC clusters

There are two projects, DataAccess and Examples.

The DataAccess project contains the classes to support reading data from a THOR cluster with a Spark RDD. In addition, te HPCC data is exposed as a Dataframe for the convenience of the Spark developer.

The Examples project contains examples in Scala for using HPCC THOR cluster based data in a Machine Learning application.

Spark-hpcc (sparkthor) roadmap

3Q19

Support Spark 2.4.x
Provide Zeppelin usage guidelines -- (Basic setup, Spark/pyspark Interpreter setup, Link to HPCC JARs via MVN, ESP Credential masking, Dep repo setup, Code repo setup)
DataSource API 2 support
Interface Improvement (including structured file filter support)
Pyspark HPCC write support

4Q19

Sparkthor docker container (full integration with HPCC container environment)
Security improvements
Variable file format support

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
DataAccess		DataAccess
Examples		Examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-HPCC

Spark-hpcc (sparkthor) roadmap

3Q19

4Q19

About

Releases

Packages

Languages

License

rpastrana/Spark-HPCC

Folders and files

Latest commit

History

Repository files navigation

Spark-HPCC

Spark-hpcc (sparkthor) roadmap

3Q19

4Q19

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages