Different Ways of Creating a DataFrame in Spark
Sachin D N ????
Data Consultant @ Lumen Technologies | Data Engineer | Big Data Engineer | AWS | Azure | Apache Spark | Databricks | Delta Lake | Agile | PySpark | Hadoop | Python | SQL | Hive | Data Lake | Data Warehousing | ADF
Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics. One of its core data structures is DataFrame, a distributed collection of data organized into named columns. Here are different ways to create a DataFrame in Spark:
Using spark.read
We can create a DataFrame from a data source file like CSV, JSON, or Parquet. Here's an example using CSV:
df?=?spark.read .format("csv").option("header","true").load(filePath)
Using spark.sql
We can create a DataFrame as a result of a Spark SQL query:
df?=?spark.sql("select?*?from?table_name")
Using spark.table
We can create a DataFrame from a table in Spark's catalog:
df?=?spark.table("table_name")
Using spark.range
You can create a DataFrame with a single long column named id, containing elements in a range:
df?=?spark.range(start_range,?end_range,?increment)
Creating DataFrame from Local List
We can create a DataFrame from a local list:
df?=?spark.createDataFrame(list).toDF("column_name")
Creating DataFrame with Explicit Schema
We can create a DataFrame with an explicit schema:
from?pyspark.sql.types?import?StructType,?StructField,?StringType
schema?=?StructType([
????StructField("column_name_1",?StringType(),?True),
????StructField("column_name_2",?StringType(),?True)
])
df?=?spark.createDataFrame(list,?schema)
Creating DataFrame from RDD
We can create a DataFrame from an RDD (Resilient Distributed Dataset), another fundamental data structure in Spark:
rdd?=?spark.sparkContext.parallelize(list)
df?=?rdd.toDF()
In conclusion, Spark provides various ways to create DataFrames to suit different needs, making it a versatile tool for big data processing and analytics.
96K | Director/ Artificial Intelligence, Data & Analytics @ Gartner / Top Voice
8 个月I am all for making data frames in Spark easier, Sachin D N ????, so I appreciate the tips! ??????????
Impressive insights on Spark DataFrames, Sachin! It's always great to see comprehensive guides that simplify complex data processing tasks.