登录查看更多内容

Outputting Tables via Outlook in Microsoft Fabric Pipelines: A Step-by-Step Guide

INFuse Data Solutions

Data Analytics & Migration Specialists. Trusted data partner for leading UK organisations.

发布日期: 2024年9月19日

Pipeline runs often need to provide some form of tabular feedback, that could, for example be in an ETL run where error handling requires human input, or where summary data needs to be delivered on to a sales director at the end of every month. As much as I enjoy Microsoft Fabric, I was hoping this was going to be as easy as adding a script activity output to an Outlook activity in Data Factory but after an hour or two of messing around with pipeline expressions (which aren’t my friend) it was clear another solution was required. The good news is that there is a range of solutions so you’re likely to find something that works, and we’ll cover all of those later in the post, but the one that was suitable in my use case and today’s focus is using a notebook to query a table, wrap that in HTML, output that on notebook exit, and pop it in the body of an email.

In this blog post, we'll walk through a script that captures validation errors from a data pipeline and presents them in a neatly formatted HTML table. If you don’t fancy reading through every step today, then the good news is that you can copy and reuse the code here by just changing the SQL query to your own.

Overview

The overall idea is that we will have a pipeline which runs a notebook. When that notebook runs the Lakehouse is queried and a DataFrame made. In the notebook we then use a python library to wrap that table into HTML which is then passed to an Outlook activity on exit. We have all sorts of options for display customisation, so it makes for a flexible approach.

In the notebook we have one cell which does the following:

1. Query the error details from a database using Spark SQL

2. Convert the query results into a Pandas DataFrame

3. Transform the DataFrame into an HTML table with custom styling

4. Output the HTML content using the notebook exit function

Here is that cell:

Let's dive into each step to understand how it works.

Prerequisites

Before you begin, make sure you have:

- Basic understanding of HTML & Python and SQL.

- Familiar with Fabric Notebooks

- It’s helpful to have an LLM like Copilot on hand to help quickly adjust your html or python script. Honestly, it’s okay to do this rather than spend 8 hours relearning HTML, use the advanced tools available to you.

Okay let’s dig into the actual steps on how to do this:

?Step 1: Import Necessary Libraries

import pandas as pd

We start by importing the Pandas library for data

Step 2: Define the SQL Query

query = """
SELECT 
    e.RunNumber, 
    e.FileName, 
    DATE_TRUNC('second', e.ErrorTimestamp) AS ErrorTimestamp, 
    e.ErrorType, 
    e.ErrorMessage
FROM 
    SilverLakehouse.error_event_detail e
WHERE 
    e.RunNumber IN (
        SELECT RunNumber
        FROM SilverLakehouse.load_audit_history
        WHERE BatchID = (SELECT MAX(BatchID) FROM SilverLakehouse.load_audit_history)
    )
ORDER BY 
    e.RunNumber DESC
"""

This particular SQL query we’ve mocked up retrieves error details from the SilverLakehouse.error_event_detail table, filtering for errors that occurred in the latest batch run for an ETL pipeline. You can replace this with your own query.

Step 3: Execute the Query and Convert to Pandas DataFrame

df = spark.sql(query).toPandas()

We execute the SQL query using Spark's sql method and convert the resulting Spark DataFrame to a Pandas DataFrame using toPandas(). This conversion allows us to leverage Pandas' data manipulation capabilities.

Step 4: Convert the DataFrame to an HTML Table

html_table = df.to_html(index=False, border=1, classes='dataframe')

We use Pandas' to_html() method to convert the DataFrame into an HTML table. The parameters are:

- index=False: Excludes the DataFrame index from the table.

- border=1: Adds a border around the table cells.

- classes='dataframe': Assigns a CSS class to the table for styling purposes.

Step 5: Clean Up the HTML Table

领英推荐

Comparing loc and iloc in Pandas: When to Use Each for…

ITVersity, Inc. 1 个月前

Loading Data in GraphDB: Best Practices and Tools

Ontotext 2 年前

Converting PDFs to Usable Data

datalab 2 年前

html_table = html_table.replace("

", "").replace("  ", "")

We remove unnecessary newline characters and extra spaces from the HTML string to ensure the HTML content is clean and compact. I had to play around with this a bit until things displayed correctly.

Step 6: Define the HTML Template with Custom Styling

html_template = f"""
<html>
<head>
    <style>
        body {{
            font-family: Arial, sans-serif;
        }}
        table {{
            border-collapse: collapse;
            width: 100%;
            font-size: 14px;
        }}
        th {{
            border: 1px solid #dddddd;
            background-color: #f2f2f2;
            font-weight: bold;
            padding: 8px;
            text-align: left;
            font-size: 16px;
        }}
        td {{
            border: 1px solid #dddddd;
            padding: 8px;
            text-align: left;
            font-size: 13px;
        }}
        h2 {{
            font-size: 18px;
        }}
    </style>
</head>
<body>
    <h2> Errors in Latest Data Load</h2>
    {html_table}
</body>
</html>
"""

Here, we create a multi-line string (`html_template`) that includes:

- HTML Structure - Basic HTML tags like <html>, <head>, and <body>.

- CSS Styling - Custom styles to enhance the appearance of the table and text.

- Table Insertion - We embed the html_table string into the template using an f-string. An f-string is a way of inserting a string into something else which might change e.g. My name is X I am X years old.

Custom Styling Explained

Fonts: We set the font family to Arial.

Table: The border-collapse property ensures borders are shared between cells, and width: 100% makes the table span the full width of the container.

Headers (`th`): We apply a background color, padding, and larger font size for headers.

Data Cells (`td`): Similar styling with slightly smaller font size.

Heading (`h2`): Styles the main heading of the report.

Step 7: Clean Up the Final HTML Template

html_template_clean = html_template.replace("

", "").replace("

", "")

We remove any remaining unwanted characters from the HTML template to ensure it's a single continuous string.

Step 8: Output the HTML Using the Notebook Exit Function

from notebookutils import mssparkutils

mssparkutils.notebook.exit(html_template_clean

We import mssparkutils from notebookutils and use the notebook.exit() function to output the HTML content. This function terminates the notebook execution and returns the HTML string.

How It Fits into the Data Pipeline

With the output complete we can then use it in an email just by adding an outlook activity and using the notebook exit content in the body. Give it a few tests to make sure it works well for you but all being well this should be a working solution now!

Alternative Methods to Deliver Tabular Data via Email

If you’re just here for the script, then we’re done and good luck, but if you’re interested in how else this could be achieved then keep on reading.

So, beyond this solution there were quite a few others which could, to a greater or less degree, have achieved the objective of passing data to the users for error handling. Below, we'll explore these alternatives, their advantages, and limitations. We'll also discuss though why an email with a table is often the best solution.

I think it’s important to leave you with a real understanding of what this solution is compared to the alternatives so for each one I’ve detailed a few steps on how they can be implemented.

Click here to read more.

So why is an Email with a Table the Best Pipeline-Driven Solution?

After looking through the alternatives I’m still happy with how I chose to implement this in my last project, but it’s all down to context. I was largely stuck within Fabric, with no access to Azure, or Power Automate, and with a clear objective of retrieving tabular data via email.

That said I think there are a few reasons why this solution stands out:

Immediate Notification: Stakeholders receive data as soon as the pipeline completes, enabling quicker decision-making.
Automation: Reduces manual intervention, minimising the risk of errors and saving time.
Direct Delivery: Emails are pushed to users, eliminating the need for them to access additional systems or platforms.
Universal Access: Emails can be accessed on various devices (desktops, tablets, smartphones), increasing accessibility.
Customisable Content: Embedding tables in emails allows for tailored content, highlighting the most critical data.
Integrated Security: Keeps data within your secure pipeline and email system, reducing exposure to external threats.
Pipeline driven: This all happens as part of an ordered execution with an ETL pipeline, we can make sure emails are sent at the right time and after the right dependencies are fulfilled.
User Convenience: Users can archive, search, and reference emails easily within their inbox. (Okay maybe this isn’t the most compelling positive but let’s be honest most people like things to be available in their inbox if it can be.
Inbox rules: The user could add rules for more automation fun

Conclusion

I hope this post has been helpful, it’s been a big one to put together but also a lot of fun learning and testing a new solution which I hadn’t seen before. Feel free to take and reuse this code to deliver some data to whoever needs it, or to follow one of the other routes. Whichever way you choose feel free to reach out to us if you have any questions. Even better, if you have any tips to improve this then I would love to hear them!

Outputting Tables via Outlook in Microsoft Fabric Pipelines: A Step-by-Step Guide

INFuse Data Solutions

Data Analytics & Migration Specialists. Trusted data partner for leading UK organisations.

Overview

Prerequisites

?Step 1: Import Necessary Libraries

Step 2: Define the SQL Query

Step 3: Execute the Query and Convert to Pandas DataFrame

Step 4: Convert the DataFrame to an HTML Table

Step 5: Clean Up the HTML Table

领英推荐

Step 6: Define the HTML Template with Custom Styling

Custom Styling Explained

Step 7: Clean Up the Final HTML Template

Step 8: Output the HTML Using the Notebook Exit Function

How It Fits into the Data Pipeline

Alternative Methods to Deliver Tabular Data via Email

So why is an Email with a Table the Best Pipeline-Driven Solution?

Conclusion

Infused with Data: A BI Blog

692 位关注者

INFuse Data Solutions的更多文章

社区洞察

其他会员也浏览了

Top 7 Data Analytics Tools in 2023

ADF CUSTOM ACTIVITY

No Code - Convert XLS/CSV files into Parquet with Dremio

What is a Power Query Template and why is it a big deal in the era of Fabric?

Using Tarql to Convert Excel Spreadsheets to RDF

Vector Embeddings and Fuzzy Matching with SQL

Using ORMs vs Writing Raw Queries

The Query and the Pivot – When your Tables need more than a Sheet

Practice Window Functions for Data Analysis with SQLite and Jupyter Notebook

Overview

Prerequisites

?Step 1: Import Necessary Libraries

Step 2: Define the SQL Query

Step 3: Execute the Query and Convert to Pandas DataFrame

Step 4: Convert the DataFrame to an HTML Table

Step 5: Clean Up the HTML Table

领英推荐

Step 6: Define the HTML Template with Custom Styling

Custom Styling Explained

Step 7: Clean Up the Final HTML Template

Step 8: Output the HTML Using the Notebook Exit Function

How It Fits into the Data Pipeline

Alternative Methods to Deliver Tabular Data via Email

So why is an Email with a Table the Best Pipeline-Driven Solution?

Conclusion

Infused with Data: A BI Blog

692 位关注者

INFuse Data Solutions的更多文章

Dataflows in Microsoft Fabric: When Should You Use Them?

End-to-End Solution: Using SharePoint Lists, Power Apps, and Power BI to Collect, Approve, and Visualise Data

Infused with Data needs your help!

INFuse Data Solutions Live on G-Cloud 14

Supercharging your Semantic Layer and Automating Power BI Documentation

Mastering Microsoft Fabric Roles: From Viewer to Admin

May 2024 Roundup - New Features and Tips in Microsoft Fabric & Power BI

How To Get Data Out of Power BI and Into Microsoft Fabric

Visual Calculations in Power BI

Are your Power BI reports being used to their full advantage? Here's how to ensure they are...

社区洞察

其他会员也浏览了

Top 7 Data Analytics Tools in 2023

ADF CUSTOM ACTIVITY

No Code - Convert XLS/CSV files into Parquet with Dremio

What is a Power Query Template and why is it a big deal in the era of Fabric?

Using Tarql to Convert Excel Spreadsheets to RDF

Vector Embeddings and Fuzzy Matching with SQL

Using ORMs vs Writing Raw Queries

The Query and the Pivot – When your Tables need more than a Sheet

Practice Window Functions for Data Analysis with SQLite and Jupyter Notebook