Fabric AI Skills is enabling a better development process
After the announcement of the AI Skills preview, I have had the chance to play around with the Fabric equivalent of Databricks Genie and want to share my thoughts.
What is this new technology?
AI Skills and Genie are technologies for asking questions directly to your data by leveraging existing LLM models. The data engineer can use prompt engineering techniques to fine-tune the model minimizing wrongful data analysis by the system.
Why is that important?
I believe this new technology can greatly impact how we do analytics and possibly usher in a shift in how data professionals deliver value to the end-users. In essence, AI Skills can bridge that age-old gap of how to let end users work with the data after the semantic model is created. I think that AI Skills and similar tools can help data teams avoid getting stuck in "IT helpdesk mode" as described at length by Ergest Xheblati in his post:
What I have taken from the post, is that the data team often takes in requirements from the business assuming that they know the data in depth and know what they want from it, whereas in reality, they use the project more as a data exploration. This leads to many more (actually useful) questions just as the project is about to run out of money as the visual layer is created at the end of the project. However, with the budget running out these questions can't be answered as modeling the data accordingly will overspend. It is a vicious circle that is hard to break today with existing tools by giving end users access to raw and semi-raw data.
How we do today
Today, this typically takes the form of these three approaches when trying to give end users access to the underlying data:
Neither of the above approaches provides good solutions, as they all require high technical skills on the end user side, and often result in the data team being IT helpdesk.
This is where tools like AI Skills come in and can serve both the end users by making them less dependent on learning new skills or relying on the data engineer, and for the analytics team to get fewer requests for adding non-tested functionality.
How we can do it in the future
First, I want to show how simple it is to build an AI Skill that can be used by end users. Then I'll dive into how this new approach, can alter how we build data solutions going forward for the better.
Building an AI Skill
If you have Parquet or CSV files in the lakehouse, you can load them to Delta Lake tables thereby making them available to an AI Skill. After creating the skill, the tables can be loaded to the model:
There are many recommendations on how to optimize the model, but now end users will be able to query the data using plain English:
And this is exactly the point! Now end users can query the data with little or no transformations done by a data engineer and then the data engineer can work with end users to optimize the models to better provide correct answers.
How this can change how we build data products
With this new technology, we can change the development process from user acceptance tests of the Power BI report/app near the end of the project, to testing the (almost) raw data and the underlying assumptions around the data at the very beginning of the project.
This can usher in a much more iterative process for building data products, where the data engineer publishes a first draft model and then iterates over it fine-tuning the responses based on user feedback. As a positive side effect the Power BI report that ends up being built, should be much more aligned with what the end users need and what the data can provide, avoiding the app becoming this meme:
Before the hype train carries us away, let's look at some of the pros and cons.
Final thoughts
Use correctly, I think this new technology can seriously improve how we develop data products getting the right product to the end users faster, but we still need to keep these things in mind:
The data model must still be optimized
Don't be fooled by the ability to expose near raw data to end users like this, it's not a silver bullet and the model will not automatically optimize itself. That still needs to be, maybe continuously, done by a data engineer based on the questions asked and answers returned by the model.
Frontend applications won't die because of this
Yes, that includes Power BI, but hopefully, this will help lessen the sprawl of unused reports scattered around most organizations. End users still need some reports, but using tools like AI Skills can reduce the demand for visualizations as ad hoc queries can be answered directly from the model.
Monitoring
As of this writing, I have yet to see how to monitor an AI Skill. For the model to be optimized, it's preferable to be able to see the actual questions asked and the answers provided with SQL. That way instructions and SQL examples can be crafted to heighten the quality and accuracy of answers.
How do semantic models fit with AI Skills?
I hope that Microsoft will enable AI Skills to run on semantic models. This will make perfect sense as we can optimize one model that serves both the AI Skill and the Power BI app. With DAX we can also more easily handle semi- and non-additive measures than in SQL, and the relationships and cardinality in semantic models should make queries return more valid data. If the LLM can write SQL, it can write DAX, right?
I have high hopes for the positive impact AI Skills can have on our way of working, and can surely see a future where data engineers spend more time optimizing models based on usage patterns instead of adding to the Power BI report sprawl of today.
Specialist - Data Engineer, Arla Foods
3 周Romantisk ????