This document details instructions to install Delight when using Apache Livy REST API, using the /sessions or the /batches routes.
This document assumes that you have created an account and generated an access token on the Delight website.
Add the following key-value pairs to the conf
(Spark configuration properties) field:
spark.jars.packages=co.datamechanics:delight_<replace-with-your-scala-version-2.11-or-2.12>:latest-SNAPSHOT
spark.jars.repositories=https://oss.sonatype.org/content/repositories/snapshots
spark.delight.accessToken.secret=<your access token>
spark.extraListeners=co.datamechanics.delight.DelightListener
A real-world example of submission instrumented with Delight would look like this:
POST /batches
{
"file": "/test/spark-examples.jar",
"className": "org.apache.spark.examples.SparkPi",
"driverMemory": "1G",
"driverCores": 1,
"executorCores": 3,
"executorMemory": "20G",
"numExecutors": 2,
"name": "application-name",
"conf": {
"spark.jars.packages": "co.datamechanics:delight_<replace-with-your-scala-version-2.11-or-2.12>:latest-SNAPSHOT",
"spark.jars.repositories": "https://oss.sonatype.org/content/repositories/snapshots",
"spark.delight.accessToken.secret": "<your access token>",
"spark.extraListeners": "co.datamechanics.delight.DelightListener"
},
"args": ["1000"]
}
Delight provides information about memory usage for Spark version 3.0.0 and above. For this feature to work, you'll need the proc filesystem (
procfs
) and the commandpgrep
available in your runtime.If you're running Apache Livy on AWS EMR, Google Dataproc, or Databricks,
procfs
andpgrep
are available. On other systems, you may have to install them.pgrep
is usually part of theprocps
package on UNIX operating systems.