Outputting Tables via Outlook in Microsoft Fabric Pipelines: A Step-by-Step Guide

Outputting Tables via Outlook in Microsoft Fabric Pipelines: A Step-by-Step Guide

Pipeline runs often need to provide some form of tabular feedback, that could, for example be in an ETL run where error handling requires human input, or where summary data needs to be delivered on to a sales director at the end of every month. As much as I enjoy Microsoft Fabric, I was hoping this was going to be as easy as adding a script activity output to an Outlook activity in Data Factory but after an hour or two of messing around with pipeline expressions (which aren’t my friend) it was clear another solution was required. The good news is that there is a range of solutions so you’re likely to find something that works, and we’ll cover all of those later in the post, but the one that was suitable in my use case and today’s focus is using a notebook to query a table, wrap that in HTML, output that on notebook exit, and pop it in the body of an email.

In this blog post, we'll walk through a script that captures validation errors from a data pipeline and presents them in a neatly formatted HTML table. If you don’t fancy reading through every step today, then the good news is that you can copy and reuse the code here by just changing the SQL query to your own.


Overview

The overall idea is that we will have a pipeline which runs a notebook. When that notebook runs the Lakehouse is queried and a DataFrame made. In the notebook we then use a python library to wrap that table into HTML which is then passed to an Outlook activity on exit. We have all sorts of options for display customisation, so it makes for a flexible approach.

In the notebook we have one cell which does the following:

1. Query the error details from a database using Spark SQL

2. Convert the query results into a Pandas DataFrame

3. Transform the DataFrame into an HTML table with custom styling

4. Output the HTML content using the notebook exit function

Here is that cell:


Let's dive into each step to understand how it works.

Prerequisites

Before you begin, make sure you have:

- Basic understanding of HTML & Python and SQL.

- Familiar with Fabric Notebooks

- It’s helpful to have an LLM like Copilot on hand to help quickly adjust your html or python script. Honestly, it’s okay to do this rather than spend 8 hours relearning HTML, use the advanced tools available to you.


Okay let’s dig into the actual steps on how to do this:

?Step 1: Import Necessary Libraries

import pandas as pd        

We start by importing the Pandas library for data


Step 2: Define the SQL Query

query = """
SELECT 
    e.RunNumber, 
    e.FileName, 
    DATE_TRUNC('second', e.ErrorTimestamp) AS ErrorTimestamp, 
    e.ErrorType, 
    e.ErrorMessage
FROM 
    SilverLakehouse.error_event_detail e
WHERE 
    e.RunNumber IN (
        SELECT RunNumber
        FROM SilverLakehouse.load_audit_history
        WHERE BatchID = (SELECT MAX(BatchID) FROM SilverLakehouse.load_audit_history)
    )
ORDER BY 
    e.RunNumber DESC
"""        

This particular SQL query we’ve mocked up retrieves error details from the SilverLakehouse.error_event_detail table, filtering for errors that occurred in the latest batch run for an ETL pipeline. You can replace this with your own query.


Step 3: Execute the Query and Convert to Pandas DataFrame

df = spark.sql(query).toPandas()        

We execute the SQL query using Spark's sql method and convert the resulting Spark DataFrame to a Pandas DataFrame using toPandas(). This conversion allows us to leverage Pandas' data manipulation capabilities.


Step 4: Convert the DataFrame to an HTML Table

html_table = df.to_html(index=False, border=1, classes='dataframe')        

We use Pandas' to_html() method to convert the DataFrame into an HTML table. The parameters are:

- index=False: Excludes the DataFrame index from the table.

- border=1: Adds a border around the table cells.

- classes='dataframe': Assigns a CSS class to the table for styling purposes.


Step 5: Clean Up the HTML Table

html_table = html_table.replace("

", "").replace("  ", "")        

We remove unnecessary newline characters and extra spaces from the HTML string to ensure the HTML content is clean and compact. I had to play around with this a bit until things displayed correctly.


Step 6: Define the HTML Template with Custom Styling

html_template = f"""
<html>
<head>
    <style>
        body {{
            font-family: Arial, sans-serif;
        }}
        table {{
            border-collapse: collapse;
            width: 100%;
            font-size: 14px;
        }}
        th {{
            border: 1px solid #dddddd;
            background-color: #f2f2f2;
            font-weight: bold;
            padding: 8px;
            text-align: left;
            font-size: 16px;
        }}
        td {{
            border: 1px solid #dddddd;
            padding: 8px;
            text-align: left;
            font-size: 13px;
        }}
        h2 {{
            font-size: 18px;
        }}
    </style>
</head>
<body>
    <h2> Errors in Latest Data Load</h2>
    {html_table}
</body>
</html>
"""        

Here, we create a multi-line string (`html_template`) that includes:

- HTML Structure - Basic HTML tags like <html>, <head>, and <body>.

- CSS Styling - Custom styles to enhance the appearance of the table and text.

- Table Insertion - We embed the html_table string into the template using an f-string. An f-string is a way of inserting a string into something else which might change e.g. My name is X I am X years old.


Custom Styling Explained

Fonts: We set the font family to Arial.

Table: The border-collapse property ensures borders are shared between cells, and width: 100% makes the table span the full width of the container.

Headers (`th`): We apply a background color, padding, and larger font size for headers.

Data Cells (`td`): Similar styling with slightly smaller font size.

Heading (`h2`): Styles the main heading of the report.


Step 7: Clean Up the Final HTML Template

html_template_clean = html_template.replace("

", "").replace("

", "")        

We remove any remaining unwanted characters from the HTML template to ensure it's a single continuous string.


Step 8: Output the HTML Using the Notebook Exit Function

from notebookutils import mssparkutils

mssparkutils.notebook.exit(html_template_clean        

We import mssparkutils from notebookutils and use the notebook.exit() function to output the HTML content. This function terminates the notebook execution and returns the HTML string.


How It Fits into the Data Pipeline

With the output complete we can then use it in an email just by adding an outlook activity and using the notebook exit content in the body. Give it a few tests to make sure it works well for you but all being well this should be a working solution now!


Alternative Methods to Deliver Tabular Data via Email

If you’re just here for the script, then we’re done and good luck, but if you’re interested in how else this could be achieved then keep on reading.

So, beyond this solution there were quite a few others which could, to a greater or less degree, have achieved the objective of passing data to the users for error handling. Below, we'll explore these alternatives, their advantages, and limitations. We'll also discuss though why an email with a table is often the best solution.

I think it’s important to leave you with a real understanding of what this solution is compared to the alternatives so for each one I’ve detailed a few steps on how they can be implemented.

Click here to read more.


So why is an Email with a Table the Best Pipeline-Driven Solution?

After looking through the alternatives I’m still happy with how I chose to implement this in my last project, but it’s all down to context. I was largely stuck within Fabric, with no access to Azure, or Power Automate, and with a clear objective of retrieving tabular data via email.

That said I think there are a few reasons why this solution stands out:

  • Immediate Notification: Stakeholders receive data as soon as the pipeline completes, enabling quicker decision-making.
  • Automation: Reduces manual intervention, minimising the risk of errors and saving time.
  • Direct Delivery: Emails are pushed to users, eliminating the need for them to access additional systems or platforms.
  • Universal Access: Emails can be accessed on various devices (desktops, tablets, smartphones), increasing accessibility.
  • Customisable Content: Embedding tables in emails allows for tailored content, highlighting the most critical data.
  • Integrated Security: Keeps data within your secure pipeline and email system, reducing exposure to external threats.
  • Pipeline driven: This all happens as part of an ordered execution with an ETL pipeline, we can make sure emails are sent at the right time and after the right dependencies are fulfilled.
  • User Convenience: Users can archive, search, and reference emails easily within their inbox. (Okay maybe this isn’t the most compelling positive but let’s be honest most people like things to be available in their inbox if it can be.
  • Inbox rules: The user could add rules for more automation fun


Conclusion

I hope this post has been helpful, it’s been a big one to put together but also a lot of fun learning and testing a new solution which I hadn’t seen before. Feel free to take and reuse this code to deliver some data to whoever needs it, or to follow one of the other routes. Whichever way you choose feel free to reach out to us if you have any questions. Even better, if you have any tips to improve this then I would love to hear them!

要查看或添加评论,请登录

INFuse Data Solutions的更多文章

社区洞察

其他会员也浏览了