Spark Quick Reference ↩
This document gathers Spark related tips and hints. |
Abbreviation | Description |
---|---|
AES | Advanced Encryption Standard |
DAG | Direct Acyclic Graph |
HDFS | Hadoop Distributed File System |
ML | Machine Learning |
OLAP | Online Analytical Processing |
RDD | Resilient Distributed Dataset |
UDF | User-defined Functions |
UDTF | User-defined Table Functions |
YARN | Yet Another Resource Negotiator |
Spark Properties ▴
Spark configuration can be specified in three ways:
- Using a property file (either option
--properties-file FILE
or fileconf\spark-defaults.conf
as default location). - Programmatically with setter methods of
org.apache.spark.SparkConf
. - Using dedicated command line options or
-c PROP=VALUE
for arbitrary properties.
For instance (see also article "Spark submit --num-executors --executor-cores --executor-memory", March 2022):
Programmatically | Command line option | |
---|---|---|
.set("spark.executor.cores", "8") |
--executor-cores 8 |
|
.set("spark.executor.memory", "128m") |
--executor-memory 128m |
|
.setAppName("name") |
--name "name" |
|
.setMaster("local[2]") |
--master "local[2]" |
|
.setSparkHome("<some path>") |
-- |
Note: The
spark-submit
command internally usesorg.apache.spark.deploy.SparkSubmit
class with the options and command line arguments you specify.