登录查看更多内容

Read a JSON of complex structure in Microsoft

Armando García Gama

Tech Lead @APEX Systems | Data Analytics | Business Intelligence

发布日期: 2024年10月15日

When reading a JSON with a complex Struct type you may have multiple problems to get the desired array, so let’s explore a simple way to do it in a Fabric notebook.

Read JSON with Microsoft Fabric Notebook

We open the Lakehouse that we will use. Add a new Notebook, entering the following code in the first block:

#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)

Getting DataFrame Schema

We will obtain the schema of our DataFrame by adding a new code block.

df.printSchema()

We will obtain the result of our schema. For the purpose of this demo we will use the following schema as output.

root

|-- eTag: string (nullable = true)

|-- id: string (nullable = true)

|-- location: string (nullable = true)

|-- name: string (nullable = true)

|-- properties: struct (nullable = true)

| |-- columns: array (nullable = true)

领英推荐

How to Implement Dim_Date in Microsoft Fabric using…

Pubudu Dewagama 6 个月前

How to work Inner join in sql ?

Ahmed Khaleel 2 年前

Exploring Delta Lake in Microsoft Fabric: A Relational…

Jesus Lopez Martin 3 个月前

| | |-- element: struct (containsNull = true)

| | | |-- name: string (nullable = true)

| | | |-- type: string (nullable = true)

| |-- nextLink: string (nullable = true)

| |-- rows: array (nullable = true)

| | |-- element: string (containsNull = true)

|-- sku: string (nullable = true)

|-- type: string (nullable = true)

Creating a schema for my data

We will return to the first code block to modify the function that reads our dataframe, adding the schema obtained.

from pyspark.sql.types import *

#schema for data
orderSchema = StructType([
    StructField("eTag", StringType()),
    StructField("id", StringType()),
    StructField("location", StringType()),
    StructField("name", StringType()),
    StructField("properties", StructType() 
        .add("columns",ArrayType(StructType()
            .add("name",StringType())
            .add("type",StringType())))
        .add("nextLink", StringType())
        .add("rows",ArrayType(StringType()))
    ),
    StructField("sku", StringType()),
    StructField("type", StringType())
    ])

#reading JSON file with pyspark
df = spark.read.json("ABFS File Path")
# df now is a Spark DataFrame containing JSON data, display the data
display(df)

Get the desired array

Now we can easily obtain the data array we want to work with.

#Select row attribute from properties column
exploded_df = df.select('properties.*').select("rows")
#display column
display(exploded_df)

Now we can manipulate our array without any inconvenience

Armando García Gama的更多文章

Using Power BI with CHAT GPT

2024年10月16日

Using Power BI with CHAT GPT

In this blog we will show a way to ask a question in PBI to GPT chat and receive an answer in real time. Architecture…

Read a JSON of complex structure in Microsoft

Armando García Gama

Tech Lead @APEX Systems | Data Analytics | Business Intelligence

Read JSON with Microsoft Fabric Notebook

领英推荐

Creating a schema for my data

Get the desired array

Armando García Gama的更多文章

社区洞察

其他会员也浏览了

Analyze data with Apache Spark in Microsoft Fabric Lakehouse

A Power Query Function to Convert HEX, OCT and BIN values to DEC

Fabric Articles @ SQL Server Central

MAX_BY Magic

See your Queries better

Fetch Files and/or Folders with Filtering and Masking: Power Query

JSON Crack - A tool for visualizing JSON into interactive graphs

Why I Hated Import Mode and How Fabric's DirectLake will come to my Rescue?

Databricks Views

Read JSON with Microsoft Fabric Notebook

领英推荐

Creating a schema for my data

Get the desired array

Armando García Gama的更多文章

Using Power BI with CHAT GPT

社区洞察

其他会员也浏览了

Analyze data with Apache Spark in Microsoft Fabric Lakehouse

A Power Query Function to Convert HEX, OCT and BIN values to DEC

Fabric Articles @ SQL Server Central

MAX_BY Magic

See your Queries better

Fetch Files and/or Folders with Filtering and Masking: Power Query

JSON Crack - A tool for visualizing JSON into interactive graphs

Why I Hated Import Mode and How Fabric's DirectLake will come to my Rescue?

Databricks Views