登录查看更多内容

Fabric AI Skills is enabling a better development process

Morten Gammelgaard Hannibalsen

Senior Data Engineer

发布日期: 2025年2月14日

After the announcement of the AI Skills preview, I have had the chance to play around with the Fabric equivalent of Databricks Genie and want to share my thoughts.

What is this new technology?

AI Skills and Genie are technologies for asking questions directly to your data by leveraging existing LLM models. The data engineer can use prompt engineering techniques to fine-tune the model minimizing wrongful data analysis by the system.

Why is that important?

I believe this new technology can greatly impact how we do analytics and possibly usher in a shift in how data professionals deliver value to the end-users. In essence, AI Skills can bridge that age-old gap of how to let end users work with the data after the semantic model is created. I think that AI Skills and similar tools can help data teams avoid getting stuck in "IT helpdesk mode" as described at length by Ergest Xheblati in his post:

What I have taken from the post, is that the data team often takes in requirements from the business assuming that they know the data in depth and know what they want from it, whereas in reality, they use the project more as a data exploration. This leads to many more (actually useful) questions just as the project is about to run out of money as the visual layer is created at the end of the project. However, with the budget running out these questions can't be answered as modeling the data accordingly will overspend. It is a vicious circle that is hard to break today with existing tools by giving end users access to raw and semi-raw data.

How we do today

Today, this typically takes the form of these three approaches when trying to give end users access to the underlying data:

Exporting the data to Excel
Working with the semantic model in Power BI
Working with the data directly in the warehouse using SQL (Unity Catalog, Fabric Warehouse, etc.)

Neither of the above approaches provides good solutions, as they all require high technical skills on the end user side, and often result in the data team being IT helpdesk.

This is where tools like AI Skills come in and can serve both the end users by making them less dependent on learning new skills or relying on the data engineer, and for the analytics team to get fewer requests for adding non-tested functionality.

How we can do it in the future

First, I want to show how simple it is to build an AI Skill that can be used by end users. Then I'll dive into how this new approach, can alter how we build data solutions going forward for the better.

Building an AI Skill

If you have Parquet or CSV files in the lakehouse, you can load them to Delta Lake tables thereby making them available to an AI Skill. After creating the skill, the tables can be loaded to the model:

Adding some Pluralsight data to the AI Skill

There are many recommendations on how to optimize the model, but now end users will be able to query the data using plain English:

A chat prompt with an answer and accompanying SQL code

And this is exactly the point! Now end users can query the data with little or no transformations done by a data engineer and then the data engineer can work with end users to optimize the models to better provide correct answers.

How this can change how we build data products

With this new technology, we can change the development process from user acceptance tests of the Power BI report/app near the end of the project, to testing the (almost) raw data and the underlying assumptions around the data at the very beginning of the project.

This can usher in a much more iterative process for building data products, where the data engineer publishes a first draft model and then iterates over it fine-tuning the responses based on user feedback. As a positive side effect the Power BI report that ends up being built, should be much more aligned with what the end users need and what the data can provide, avoiding the app becoming this meme:

Before the hype train carries us away, let's look at some of the pros and cons.

Final thoughts

Use correctly, I think this new technology can seriously improve how we develop data products getting the right product to the end users faster, but we still need to keep these things in mind:

The data model must still be optimized

Don't be fooled by the ability to expose near raw data to end users like this, it's not a silver bullet and the model will not automatically optimize itself. That still needs to be, maybe continuously, done by a data engineer based on the questions asked and answers returned by the model.

Frontend applications won't die because of this

Yes, that includes Power BI, but hopefully, this will help lessen the sprawl of unused reports scattered around most organizations. End users still need some reports, but using tools like AI Skills can reduce the demand for visualizations as ad hoc queries can be answered directly from the model.

Monitoring

As of this writing, I have yet to see how to monitor an AI Skill. For the model to be optimized, it's preferable to be able to see the actual questions asked and the answers provided with SQL. That way instructions and SQL examples can be crafted to heighten the quality and accuracy of answers.

How do semantic models fit with AI Skills?

I hope that Microsoft will enable AI Skills to run on semantic models. This will make perfect sense as we can optimize one model that serves both the AI Skill and the Power BI app. With DAX we can also more easily handle semi- and non-additive measures than in SQL, and the relationships and cardinality in semantic models should make queries return more valid data. If the LLM can write SQL, it can write DAX, right?

I have high hopes for the positive impact AI Skills can have on our way of working, and can surely see a future where data engineers spend more time optimizing models based on usage patterns instead of adding to the Power BI report sprawl of today.

Azure Data Ramblings

922 位关注者

Frederik Bundgaard Christensen

Specialist - Data Engineer, Arla Foods

3 周

Romantisk ????

查看更多评论

要查看或添加评论，请登录

Morten Gammelgaard Hannibalsen的更多文章

Getting ISO year right in PySpark

2025年1月7日

Getting ISO year right in PySpark

Part of working with data and dates is the age-old question "When does the first week of the year start?". In PySpark…

3 条评论
Nested JSON arrays: the perfect niche for Dataflow Gen2 in Fabric

2024年10月23日

Nested JSON arrays: the perfect niche for Dataflow Gen2 in Fabric

On a project I'm currently working on I have to work with JSON files that are heavily nested. And by heavily I mean…

5 条评论
Correctly connecting to a file on SharePoint in Power BI Desktop to avoid refresh errors on Power BI Service

2024年7月9日

Correctly connecting to a file on SharePoint in Power BI Desktop to avoid refresh errors on Power BI Service

I have seen this issue and helped people fix it over so many years now, that it's time to write an article I can point…

9 条评论
Testing the seams of Microsoft Fabric - Part 2: data cleaning and curation

2023年12月28日

Testing the seams of Microsoft Fabric - Part 2: data cleaning and curation

Things move fast in the world of data, and with impeccable timing one week after part 1 of this series on Fabric was…
Testing the seams of Microsoft Fabric - Part 1: data ingestion

2023年11月13日

Testing the seams of Microsoft Fabric - Part 1: data ingestion

Microsoft Fabric was released in preview earlier in 2023 and it ticks the boxes for an "open monolith", where all the…

5 条评论
How to work with GraphQL in Data Factory

2023年7月13日

How to work with GraphQL in Data Factory

In a recent project, I had the pleasure of working with the Pluralsight API, which is of the GraphQL type. This posed…

2 条评论
Measuring the impact of learning

2023年6月5日

Measuring the impact of learning

In my current role, I have the pleasure of heading the Data Academy here at Arla Foods, where we produce learning…

1 条评论
Creating a dynamic date array for looping in Azure Data Factory

2022年9月4日

Creating a dynamic date array for looping in Azure Data Factory

Let me set the stage for what we are aiming for and the constraint that makes this interesting. I need to loop through…

2 条评论
Power BI Datamart - the Excel killer!

2022年5月27日

Power BI Datamart - the Excel killer!

Power BI Datamart just got announced at Microsoft Build, and already people are talking about how it'll disrupt the…

12 条评论
A BigQuery script to union multiple tables from separate datasets

2022年3月25日

A BigQuery script to union multiple tables from separate datasets

One of the nicer things in working with Google BigQuery is the ability to use wildcards to query multiple similar…

5 条评论

See all articles

What is this new technology?

Why is that important?

How we do today

How we can do it in the future

Building an AI Skill

How this can change how we build data products

Final thoughts

The data model must still be optimized

Frontend applications won't die because of this

Monitoring

How do semantic models fit with AI Skills?

Azure Data Ramblings

922 位关注者

Morten Gammelgaard Hannibalsen的更多文章

Getting ISO year right in PySpark

Nested JSON arrays: the perfect niche for Dataflow Gen2 in Fabric

Correctly connecting to a file on SharePoint in Power BI Desktop to avoid refresh errors on Power BI Service

Testing the seams of Microsoft Fabric - Part 2: data cleaning and curation

Testing the seams of Microsoft Fabric - Part 1: data ingestion

How to work with GraphQL in Data Factory

Measuring the impact of learning

Creating a dynamic date array for looping in Azure Data Factory

Power BI Datamart - the Excel killer!

A BigQuery script to union multiple tables from separate datasets