Rahti Spark Cluster

Instructions for setting up a Spark Cluster for the TextReuse ETL pipeline on CSC Rahti

Steps

Create a project on CSC Rahti
Add a spark-credentials secret with
- username for Spark
- password for Spark and Jupyter Lab login
- nbpassword for the Jupyter internally
Install OpenShift CLI and Helm on local machine
Create a values.yaml following the values-template.yaml
Log into OpenShift project by getting login command from Rahti
Run helm install spark-cluster all-spark

Additional Setup

Create a key for GitHub SSH in the persistent volume of the spark-notebook service. Then in the values-template.yaml add the location of this SSH key to add it to the SSH configmap seen in configmap.yaml.

Then when the notebook pod starts up run mkdir ~/.ssh && cp /etc/ssh-config/config ~/.ssh/config to copy the SSH configmap file to the correct location.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
all-spark		all-spark
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
all-spark-cluster-openshift-template.yaml		all-spark-cluster-openshift-template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rahti Spark Cluster

Steps

Additional Setup

About

Releases

Packages

Languages

HPC-HD/all-spark

Folders and files

Latest commit

History

Repository files navigation

Rahti Spark Cluster

Steps

Additional Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages