Read a JSON of complex structure in Microsoft
Armando García Gama
Tech Lead @APEX Systems | Data Analytics | Business Intelligence
When reading a JSON with a complex Struct type you may have multiple problems to get the desired array, so let’s explore a simple way to do it in a Fabric notebook.
Read JSON with Microsoft Fabric Notebook
We open the Lakehouse that we will use. Add a new Notebook, entering the following code in the first block:
#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)
Getting DataFrame Schema
We will obtain the schema of our DataFrame by adding a new code block.
df.printSchema()
We will obtain the result of our schema. For the purpose of this demo we will use the following schema as output.
root
|-- eTag: string (nullable = true)
|-- id: string (nullable = true)
|-- location: string (nullable = true)
|-- name: string (nullable = true)
|-- properties: struct (nullable = true)
| |-- columns: array (nullable = true)
领英推荐
| | |-- element: struct (containsNull = true)
| | | |-- name: string (nullable = true)
| | | |-- type: string (nullable = true)
| |-- nextLink: string (nullable = true)
| |-- rows: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- sku: string (nullable = true)
|-- type: string (nullable = true)
Creating a schema for my data
We will return to the first code block to modify the function that reads our dataframe, adding the schema obtained.
from pyspark.sql.types import *
#schema for data
orderSchema = StructType([
StructField("eTag", StringType()),
StructField("id", StringType()),
StructField("location", StringType()),
StructField("name", StringType()),
StructField("properties", StructType()
.add("columns",ArrayType(StructType()
.add("name",StringType())
.add("type",StringType())))
.add("nextLink", StringType())
.add("rows",ArrayType(StringType()))
),
StructField("sku", StringType()),
StructField("type", StringType())
])
#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)
Get the desired array
Now we can easily obtain the data array we want to work with.
#Select row attribute from properties column
exploded_df = df.select('properties.*').select("rows")
#display column
display(exploded_df)
Now we can manipulate our array without any inconvenience