登录查看更多内容

Running an Open Footprint Lakehouse on Microsoft Fabric

Kadri Umay

OT IT | Energy Data Platform | IASA Distinguished Architect | Carbon Capture and Storage | Ecosystem | Technology Evangelist | Public Speaker | Industry Standards OSDU, OPC, PIDX, Product Developer

发布日期: 2024年5月23日

+ 关注

Written in partnership with Peter Kowalchuk

Why Open Footprint on Microsoft Fabric?

Microsoft Fabric is an end-to-end, unified analytics platform that integrates various data and analytics tools, enabling organizations to create, share, and visualize data with ease.

OneLake, is likened to “The OneDrive for Data,” providing a single, unified SaaS data lake for the entire organization without the need for building it from scratch. OneLake organizes data into domains, indexes it for discovery, and ensures compliance with governance standards, all while providing full access through industry-standard APIs.?

Tables within OneLake, are based on Delta Lake technology, allowing for the creation of advanced analytics solutions. These tables are part of the lakehouse architecture, which combines the benefits of data lakes and data warehouses.

Enabling Open Footprint data on Delta Tables on Fabric One Lake has numerous advantages, some of which are:

?1-?????? Flexible Delta Lake architecture that enables the power of Notebooks and Spark Jobs to process emissions data in near time and at mass scale

2-?????? Out of the box connectivity with Microsoft tools and business products such as Azure AI, Copilot Studio, Power BI, Power Apps, Dynamics 365, Office 365, Teams. No need to build any connectors as Fabric connectivity is supported natively.

3-?????? Connectivity across 3rd party data platforms such as Snowflake, Databricks and across all major clouds. ?

4-?????? Real Time data streaming capabilities in Fabric enables reading activity from all major sources out of the box.

5-?????? Integration with Microsoft Sustainability Manager is supported out of the box thru the ESG Data Estate solution on Microsoft Fabric.

Open Footprint and OneLake, a match made in heaven

The Open Footprint is a forum under The Open Group that focuses on developing open and vendor-neutral industry standards to provide consistent and accurate measurement and reporting of environmental footprint data. The forum aims to develop a single set of data and metadata definitions, known as The Open Footprint Data Model, to enable emissions data to be more easily shared and aggregated and is a platform independent model where it can be deployed on OSDU, Relational Databases and Data Lakes.

We have converted the DDL script provided by the Open Footprint community for relational databases to SparkSQL to create the tables. In our assessment, we have observed couple of things that needs to be updated in the DDL script due the differences in the DDL and supported objects in OneLake Delta tables. We can’t provide the full script here due to licensing limitations, however if you are an Open Footprint member feel free to reach our Microsoft representative. The changes made were as follows:

1-?????? Removal of schema creation and usage as they are not supported.

2-?????? Removal of primary and foreign keys as they are not supported.

3-?????? Removal of comments and metadata that are defined at the table and column levels.

领英推荐

Microsoft Fabric and GenAI, the perfect partnership to…

Plain Concepts 10 个月前

Working with tables in Microsoft Fabric Lakehouse –…

Nikola Ilic 1 年前

Benefits of Microsoft Fabric

Anurodh Kumar 2 周前

4-?????? Changing all names to lowercase in case they aren’t.

5-?????? Remove the schema at the beginning of the table name in create statement.

6-?????? Remove the schema version information at the end of the table name in the create statement.

7-?????? Replace the data types such as varchar(10) with the equivalent Spark datatypes.

CREATE TABLE productfootprint (

    productfootprintid STRING,

    comment STRING,

    companyname STRING,

    companyids INTEGER,

    created STRING,

    environmentalproductdeclarationid STRING,

    pcf INTEGER,

    precedingproductfootprintid STRING,

    productcategorycpc INTEGER,

    productdescription STRING,

    productids INTEGER,

    productnamecompany STRING,

    specversion STRING,

    status STRING,

    statuscomment STRING,

    updated STRING,

    validityperiodend STRING,

    validityperiodstart STRING,

    validfrom STRING,

    validto STRING,

    version STRING

);

Running the script in the lakehouse creates the following tables under Microsoft Fabric. (only partially shown for licensing constraints)

Creating the Open Footprint Tables in Fabric with PySQL

Once the tables are created, we took a further step to upload the csv files for the reference data, first they are uploaded to Fabric Lakehouse as files. Our experience is that some of the csv files have missing columns or have differences in naming.

To test the loading, we did change the column and tables names to all lowercase and ran the following PySpark code to load the data, which went pretty well.

?Fabric automatically provides a SQL Analytics Endpoint once the delta lake tables are created which could be accessed as a SQL Server relational database.

Next Steps

?1-?????? Reference data csv’s have inconsistencies in the table schemas in terms of the column naming (_ vs -), missing columns, etc…, these should be cleared.

2-?????? Obviously sample data needs to be loaded and tested, this could be easily done using parquet or csv files which are natively supported with Spark Jobs.

3-?????? Good AI use cases, please add to comments if you have any ideas that you want us to test.

Dzmitry Krotau

?? BA Technical Lead | AI & Digital Health Expert | Driving Innovation in Healthcare, Biotech & Life Sciences | #AI #HealthcareTechnology #DigitalHealth #AIinHealthcare #Innovation

10 个月

Great news, Kadri Umay! It looks like a very smart integration of Open Footprint Data model with Microsoft Fabric. This appears to be a killer feature for OFP as the platform could be engaged with MS ecosystem easily!

Sun Maria Lehmann

Team and people first!, Technology Geek, Data Enthusiast, Chair of Boards

10 个月

What an achivement, great news and progress! Excited to follow the forward development too…

Arminder Singh

Director at KPMG Climate Data & Technology| Delivery Lead| Digital Transformation | Energy Efficiency and GHG Emissions Management | The Wharton School

10 个月

This is super interesting! Thanks Kadri Umay for sharing

Bertrand Rioux

Technology Consultant and Strategist

10 个月

Hi Kadri Umay I produced the OFP DDLs from the OFP modeling tool I am maintaining. I’ve reached out to Peter to schedule a call to discuss how we can streamline the integration with Delta Lake for Microsoft Fabric.

2 次回应

查看更多评论

要查看或添加评论，请登录

Kadri Umay的更多文章

The Evolution of Determinism, Tools, and the Future of Workflows: 2025 is the year of Agents

2025年1月2日

The Evolution of Determinism, Tools, and the Future of Workflows: 2025 is the year of Agents

As we delve into the agentic AI era, we find ourselves in uncharted waters where traditional notions of determinism…

1 条评论
Unlocking the Untapped Potential of ERP Data in Industrial Data Platforms for the Energy Industry

2024年11月18日

Unlocking the Untapped Potential of ERP Data in Industrial Data Platforms for the Energy Industry

In the digital evolution of the energy industry, enterprise resource planning (ERP) data remains an underutilized…

1 条评论
Experimenting with Microsoft Fabric AI Skill on OSDU Community Implementation - Part 2

2024年6月28日

Experimenting with Microsoft Fabric AI Skill on OSDU Community Implementation - Part 2

Following my original article, I connected Microsoft Fabric AI Skill and was able to do some interesting stuff with…

1 条评论
Experimenting with OSDU Community Implementation and Microsoft Fabric

2024年6月26日

Experimenting with OSDU Community Implementation and Microsoft Fabric

In the first two installments of this series of articles, I have shown how an OSDU CI could be stood up in 5 minutes…

1 条评论
Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 2

2024年6月13日

Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 2

In the first part of the series (1) Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 1 | LinkedIn, we have…

1 条评论
Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 1

2024年6月11日

Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 1

As a developer in the early days, I used to have my subscriptions to MSDN, which provided me access to Microsoft Beta…

1 条评论
Open Footprint on Fabric Part 2 – Activity Data Models for OPC UA

2024年6月7日

Open Footprint on Fabric Part 2 – Activity Data Models for OPC UA

In the last post, I have shown how to deploy Open Footprint schemas on Microsoft Fabric as OneLake Tables. I’ve got…

2 条评论
Use Cases for Foundation Models (aka LLMs) in the Energy Industry

2023年2月16日

Use Cases for Foundation Models (aka LLMs) in the Energy Industry

Foundation Models as a Superset of Large Language Models (LLMs) Recently, Stanford Center for Research on Foundation…

7 条评论
Gas Turbine Compressor Decay State Coefficient Prediction

2017年2月5日

Gas Turbine Compressor Decay State Coefficient Prediction

In the past years, we have witnessed dramatic drops in oil and gas prices. This has disrupted the industry players in…

1 条评论

See all articles

Running an Open Footprint Lakehouse on Microsoft Fabric

Kadri Umay

OT IT | Energy Data Platform | IASA Distinguished Architect | Carbon Capture and Storage | Ecosystem | Technology Evangelist | Public Speaker | Industry Standards OSDU, OPC, PIDX, Product Developer

领英推荐

Kadri Umay的更多文章

社区洞察

其他会员也浏览了

Azure Data and Power BI News (December 2023)

Latest Microsoft Fabric updates that can help you in 2025.

Another way of looking at Microsoft Fabric

Microsoft Fabric: A Comprehensive Guide to Workspaces, Lakehouse, Delta Tables & Z-Ordering

Unlocking Synergy: Connecting Databricks Notebooks with Microsoft Fabric OneLake

Microsoft Fabric is the New Office

Let’s Talk Microsoft Fabric

Microsoft Fabric - What's in your workspace?

Elevate Your Data Game with Microsoft Data Fabric

The problems and opportunities I see with Microsoft Fabric

领英推荐

Kadri Umay的更多文章

The Evolution of Determinism, Tools, and the Future of Workflows: 2025 is the year of Agents

Unlocking the Untapped Potential of ERP Data in Industrial Data Platforms for the Energy Industry

Experimenting with Microsoft Fabric AI Skill on OSDU Community Implementation - Part 2

Experimenting with OSDU Community Implementation and Microsoft Fabric

Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 2

Zero to OSDU CI on Azure in Less Than 5 Minutes - Part 1

Open Footprint on Fabric Part 2 – Activity Data Models for OPC UA

Use Cases for Foundation Models (aka LLMs) in the Energy Industry

Gas Turbine Compressor Decay State Coefficient Prediction

社区洞察

其他会员也浏览了

Azure Data and Power BI News (December 2023)

Latest Microsoft Fabric updates that can help you in 2025.

Another way of looking at Microsoft Fabric

Microsoft Fabric: A Comprehensive Guide to Workspaces, Lakehouse, Delta Tables & Z-Ordering

Unlocking Synergy: Connecting Databricks Notebooks with Microsoft Fabric OneLake

Microsoft Fabric is the New Office

Let’s Talk Microsoft Fabric

Microsoft Fabric - What's in your workspace?

Elevate Your Data Game with Microsoft Data Fabric

The problems and opportunities I see with Microsoft Fabric