What is PYSPARK? (Simply Explained in 1 minute)?
Python is a high level programming language
Spark is a high level Data processing and Data Analysis framework
- Spark is built in the Scala programming language
- Spark was created to analyze and process BIG DATA faster than in Python
- Spark was created to do whats called In-memory processing for Big Data
- In-memory processing means data can be processed automatically in real time
- Processing means cleaning and formatting the Data for immediate analysis
- PySpark is an API to use Python and Spark together
- This is done using Python but uploading Spark framework
- It is majorly used for processing structured and semi-structured datasets.
- Faster for data cleaning, data processing, and Machine Learning (compared to just Python)
Recommended Books: