Read a JSON of complex structure in Microsoft

Read a JSON of complex structure in Microsoft

When reading a JSON with a complex Struct type you may have multiple problems to get the desired array, so let’s explore a simple way to do it in a Fabric notebook.

Read JSON with Microsoft Fabric Notebook

We open the Lakehouse that we will use. Add a new Notebook, entering the following code in the first block:

#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)
        

Getting DataFrame Schema

We will obtain the schema of our DataFrame by adding a new code block.


df.printSchema()        


We will obtain the result of our schema. For the purpose of this demo we will use the following schema as output.


root

|-- eTag: string (nullable = true)

|-- id: string (nullable = true)

|-- location: string (nullable = true)

|-- name: string (nullable = true)

|-- properties: struct (nullable = true)

| |-- columns: array (nullable = true)

| | |-- element: struct (containsNull = true)

| | | |-- name: string (nullable = true)

| | | |-- type: string (nullable = true)

| |-- nextLink: string (nullable = true)

| |-- rows: array (nullable = true)

| | |-- element: string (containsNull = true)

|-- sku: string (nullable = true)

|-- type: string (nullable = true)


Creating a schema for my data

We will return to the first code block to modify the function that reads our dataframe, adding the schema obtained.


from pyspark.sql.types import *

#schema for data
orderSchema = StructType([
    StructField("eTag", StringType()),
    StructField("id", StringType()),
    StructField("location", StringType()),
    StructField("name", StringType()),
    StructField("properties", StructType() 
        .add("columns",ArrayType(StructType()
            .add("name",StringType())
            .add("type",StringType())))
        .add("nextLink", StringType())
        .add("rows",ArrayType(StringType()))
    ),
    StructField("sku", StringType()),
    StructField("type", StringType())
    ])

#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)        

Get the desired array

Now we can easily obtain the data array we want to work with.

#Select row attribute from properties column
exploded_df = df.select('properties.*').select("rows")
#display column
display(exploded_df)        

Now we can manipulate our array without any inconvenience

要查看或添加评论,请登录

Armando García Gama的更多文章

  • Using Power BI with CHAT GPT

    Using Power BI with CHAT GPT

    In this blog we will show a way to ask a question in PBI to GPT chat and receive an answer in real time. Architecture…

社区洞察

其他会员也浏览了