What is PYSPARK? (Simply Explained in 1 minute)?

What is PYSPARK? (Simply Explained in 1 minute)?

Python is a high level programming language 

Spark is a high level Data processing and Data Analysis framework 

  • Spark is built in the Scala programming language
  • Spark was created to analyze and process BIG DATA faster than in Python
  • Spark was created to do whats called In-memory processing for Big Data
  • In-memory processing means data can be processed automatically in real time
  • Processing means cleaning and formatting the Data for immediate analysis
No alt text provided for this image
  • PySpark is an API to use Python and Spark together
  • This is done using Python but uploading Spark framework
  • It is majorly used for processing structured and semi-structured datasets.
  • Faster for data cleaning, data processing, and Machine Learning (compared to just Python)

Recommended Books:

HOW TO FIND A CAREER IN DATA SCIENCE: The Expert Guide to become a 6 Figure Data Scientist in 12 months. 

Learn Python the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code (Zed Shaw's Hard Way Series


要查看或添加评论,请登录

Anade D.的更多文章

社区洞察

其他会员也浏览了