Pandas API on Apache Spark- Part 1: Introduction
Apache Spark has revolutionized the data science field with its support for big data. With its support for multiple languages like Scala, Python it has made big data analysis available to a wide variety of developers.
Python is the leading language preferred by the data science community. Even within the Spark community, python API has seen a tremendous upsurge in the last few years. According to databricks, the company behind the Apache Spark, 60% of the commands written on their notebook is python compared to 23% of them in Scala.
Spark has excellent support for python with the Pyspark project. Pyspark allows developers to access all different parts of spark like SQL, ML etc using python language.
Still, it has not yet reached the wider python community. The reason is the majority of python data developers prefer Pandas API.