Rahti Spark Cluster

Instructions for setting up a Spark Cluster for the TextReuse ETL pipeline on CSC Rahti

Steps

Create a project on CSC Rahti
Add a spark-credentials secret with
- username for Spark
- password for Spark and Jupyter Lab login
- nbpassword for the Jupyter internally
Install OpenShift CLI and Helm on local machine
Create a values.yaml following the values-template.yaml
Log into OpenShift project by getting login command from Rahti
Run helm install spark-cluster all-spark

Additional Setup

Create a key for GitHub SSH in the persistent volume of the spark-notebook service. Then in the values-template.yaml add the location of this SSH key to add it to the SSH configmap seen in configmap.yaml.

Then when the notebook pod starts up run mkdir ~/.ssh && cp /etc/ssh-config/config ~/.ssh/config to copy the SSH configmap file to the correct location.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rahti Spark Cluster

Steps

Additional Setup

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rahti Spark Cluster

Steps

Additional Setup