Spark Quick Reference ↩

This document gathers Spark related tips and hints.

Abbreviations

Abbreviation	Description
AES	Advanced Encryption Standard
DAG	Direct Acyclic Graph
HDFS	Hadoop Distributed File System
ML	Machine Learning
OLAP	Online Analytical Processing
RDD	Resilient Distributed Dataset
UDF	User-defined Functions
UDTF	User-defined Table Functions
YARN	Yet Another Resource Negotiator

Spark Properties ▴

Spark configuration can be specified in three ways:

Using a property file (either option --properties-file FILE or file conf\spark-defaults.conf as default location).
Programmatically with setter methods of org.apache.spark.SparkConf.
Using dedicated command line options or -c PROP=VALUE for arbitrary properties.

For instance (see also article "Spark submit --num-executors --executor-cores --executor-memory", March 2022):

Programmatically	Command line option
`.set("spark.executor.cores", "8")`	`--executor-cores 8`
`.set("spark.executor.memory", "128m")`	`--executor-memory 128m`
`.setAppName("name")`	`--name "name"`
`.setMaster("local[2]")`	`--master "local[2]"`
`.setSparkHome("<some path>")`	`--`

Note: The spark-submit command internally uses org.apache.spark.deploy.SparkSubmit class with the options and command line arguments you specify.

mics/September 2024 ▲

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUICKREF.md

QUICKREF.md

Spark Quick Reference ↩

Abbreviations

Spark Properties ▴

Files

QUICKREF.md

Latest commit

History

QUICKREF.md

File metadata and controls

Spark Quick Reference ↩

Abbreviations

Spark Properties ▴