Spark Standalone cluster on Docker with FlashBlade

The scripts and tools in this repository demonstrate automation for a Spark Standalone cluster that uses FlashBlade NFS/S3 for persistent storage.

config_s3.py : automates the creation of S3 users, keys, and buckets.

Dockerfile, build_image.sh : create a docker image for running Spark.

control_sparkstandalone.sh : shell script to start and stop a dockerized Spark Standalone cluster.

{GenerateTestData|RunClustering}.ipynb : Python notebooks demonstrating clustering algorithm.

Requirements for using these scripts:

Docker installed and access to a Docker repository.
Purity_fb python SDK installed (via pip).
FlashBlade management token for accessing REST API.
Ansible configured with the host group containing all spark worker nodes.
credentials: file containing S3 access/secret keys, can be created with config_s3.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dockerfile		Dockerfile
GenerateTestData.ipynb		GenerateTestData.ipynb
README.md		README.md
RunClustering.ipynb		RunClustering.ipynb
build_image.sh		build_image.sh
config_s3.py		config_s3.py
control_sparkstandalone.sh		control_sparkstandalone.sh
spark-defaults.conf		spark-defaults.conf

Provide feedback