1. Setup Spark in Windows
1. jdk 11
2. python 3.6
3. Hadoop WinUtils (window binaries)
4. Spark binaries
5. Environment Variables
6. Python IDE
In cmd line terminal :
setx JAVA_HOME "C:\Program Files\Java"
--- to verify the above step`
echo %JAVA_HOME%
setx PATH "%PATH%;%JAVA_HOME%\bin"
?--- to verify to check java 11 installed
java -version
1. JAVA_HOME?
env variable set &?
pointing to Java 11
2. JAVA_HOME\bin?
include in PATH env variable
3. java -version?
cmd showing JAVA 11
python 3.10.3 install?
including ADD python 3.10 to PATH
Hadoop WinUtils?
w/o this Hadoop installation will throw error :
1. no native library
2. access0
https://github.com/cdarlint/winutils
setx HADOOP_HOME "C:\Program Files\Hadoop\hadoop-3.2.2"
setx SPARK_HOME "C:\Program Files\Spark\Spark-3.2.2"
setx PYTHONPATH "E:\demo\spark-3.3.2\python; E:\demo\spark-3.3.2\python\lib\py4j-0.10.9.5-src.zip"
setx PYSPARK_PYTHON "C:\Users\alih4\AppData\Local\Programs\Python\Python310\python.exe"