Different ways of creating a Dataframe in Pyspark
Nikhil G R
Senior Data Engineer (Apache Spark Developer) @ SAP Labs India, Ex TCS, 3x Microsoft Azure Cloud Certified, Python, Pyspark, Azure Databricks, SAP BDC, Datasphere, ADLs, Azure Data factory, MySQL, Delta Lake
Using spark.read
Using spark.sql
Using spark.table
Using spark.range
Range gives a one column dataframe
Creating a Dataframe from a local list
Two step process of creating a Dataframe
If we want to explicitly specify the column names and not to go with the default values.
One Step Process of creating a Dataframe
To enforce the schema explicitly
Approach 1 - Fixing only the column names
Approach 2 - Fixing the column names and Datatypes
Creating Dataframe from RDD
Credits - Sumit Mittal sir