Skip to content

Latest commit

 

History

History
140 lines (120 loc) · 7.39 KB

SETUP.md

File metadata and controls

140 lines (120 loc) · 7.39 KB

Spark Setup

Spark project This document gathers information on the setup of the Spark framework.

Installation Steps

Spark installation requires two steps on Microsoft Windows :

  1. Download archive file spark-3.5.1-bin-hadoop3.2-scala2.13.tgz either from the Spark download page or from the Spark Distribution Directory.
  2. Download and install binary files from GitHub repository cdarlint/winutils.

🔎 Spark and Hadoop are two separate Apache projects : Spark uses Hadoop’s client libraries for HDFS and YARN.
We give below the actual dependencies between Spark and Hadoop 3 (as of October 2023):

Hadoop 3.3.0 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.4.0
Spark 3.3.0 3.3.1 3.3.2 - 3.4.2
3.5.0
3.5.1
- - -

Footnotes

[1] spark-shell session

> echo %JAVA_HOME%
C:\opt\jdk-temurin-11.0.24_8
 
> %SPARK_HOME%\bin\spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.1
      /_/

Using Scala version 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.24) [...] Spark context available as 'sc' (master = local[*], app id = local-1683397418428). Spark session available as 'spark'.

scala> print(spark.version) 3.5.1 scala> print(org.apache.hadoop.util.VersionInfo.getVersion()) 3.3.4 scala> :quit

> echo %PYSPARK_PYTHON%
C:\opt\Python-3.10.10\python.exe
 
> %SPARK_HOME%\bin\pyspark
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] on win32
[...]
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.1
      /_/

Using Python version 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023 17:20:36)
Spark context Web UI available at http://192.168.0.103:4040
Spark context available as 'sc' (master = local[*], app id = local-1683399374689).
SparkSession available as 'spark'.
>>> print(spark.version)
3.5.1
>>> print(sc._jvm.org.apache.hadoop.util.VersionInfo.getVersion())
3.3.4
>>> exit()

mics/September 2024