Spark Setup ↩
This document gathers information on the setup of the Spark framework. |
Spark installation requires two steps on Microsoft Windows :
- Download archive file
spark-3.5.1-bin-hadoop3.2-scala2.13.tgz
either from the Spark download page or from the Spark Distribution Directory. - Download and install binary files from GitHub repository
cdarlint/winutils
.
🔎 Spark and Hadoop are two separate Apache projects : Spark uses Hadoop’s client libraries for HDFS and YARN.
We give below the actual dependencies between Spark and Hadoop 3 (as of October 2023):
Hadoop 3.3.0 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.4.0 Spark 3.3.0 3.3.1 3.3.2 - 3.4.2
3.5.0
3.5.1- - -
Footnotes ▴
[1] spark-shell
session ↩
-
> echo %JAVA_HOME% C:\opt\jdk-temurin-11.0.24_8 > %SPARK_HOME%\bin\spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.5.1 /_/
Using Scala version 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.24) [...] Spark context available as 'sc' (master = local[*], app id = local-1683397418428). Spark session available as 'spark'.
scala> print(spark.version) 3.5.1 scala> print(org.apache.hadoop.util.VersionInfo.getVersion()) 3.3.4 scala> :quit
> echo %PYSPARK_PYTHON% C:\opt\Python-3.10.10\python.exe > %SPARK_HOME%\bin\pyspark Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] on win32 [...] Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.1 /_/ Using Python version 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023 17:20:36) Spark context Web UI available at http://192.168.0.103:4040 Spark context available as 'sc' (master = local[*], app id = local-1683399374689). SparkSession available as 'spark'. >>> print(spark.version) 3.5.1 >>> print(sc._jvm.org.apache.hadoop.util.VersionInfo.getVersion()) 3.3.4 >>> exit()