Exploring Fabric: putting Microsoft's new analytics platform to the test
Can we really do end-to-end analytics in Microsoft Fabric?

Exploring Fabric: putting Microsoft's new analytics platform to the test

It's been a couple of days since Microsoft let the world know about Fabric, their new unified end-to-end data analytics platform. It was quite the announcement! Fabric's proposition is bold. Shiny visuals and slick presentations have made data practitioners excited and curious, myself included. But a shiny new analytics platform is like a shiny new car, you test drive before you buy. In this post I share my first hands-on experience.

Unification?

Before diving into the tool, let's first understand the problem Microsoft aims to solve. Core to Fabric's value proposition is delivering a unified, end-to-end experience: no need to use any other analytics tool because all the functionality is available within a single platform. That sounds lovely, as today's reality is very different. The amount of specialized tools is enormous, and data engineering often feels more like systems engineering. Just have a look at the MAD (ML/AI/Data) Landscape for 2023 and you'll understand. The added value of a data team would be so much bigger if it could focus on building data products, rather than integrating different tools.

The amount of specialized tools is enormous, and data engineering often feels more like systems engineering.
No alt text provided for this image
Fabric is all about unification

Free trial :)

Fabric is in public preview, and Microsoft offers a free trial. Great! It's easy enough to sign up and activate you trial.

No alt text provided for this image
You get 60 days of trial and a fixed amount of capacity to explore Fabric

Exploring end-to-end capabilities

More than anything, I'm curious if Fabric can deliver on their promise of unification. End-to-end analytics requires many different ingredients, and I explored some of them:

  • data ingestion,
  • visualization (reporting),
  • MLOps.

I will now walk through the steps I took and share relevant insights.

Data ingestion

I first created a lakehouse using Fabric's UI.

No alt text provided for this image
Creating a lakehouse in Fabric

Next step: getting data in the lakehouse, I chose to work with the popular diabetes sample dataset.

No alt text provided for this image
Getting data in with a new data pipeline
No alt text provided for this image
Choosing for diabetes sample data
No alt text provided for this image
A Data Factory pipeline gets created automatically
No alt text provided for this image
After running the pipeline the diabetes data was in my lakehouse

Well, that was simple enough. All UI work, not a single line of code written.

Visualization (reporting)

Getting data in was as simple as ABC. Now the visualization. With Power BI taking a prominent place in Fabric (I heard it's the Power BI team that leads the Fabric efforts, maybe that has to do with it), I expected this to be a first-class experience. I was right. For every lakehouse you make, a Power BI "dataset" gets created automatically. After a few clicks you're in the Power BI interface ready to build your report.

No alt text provided for this image
Creating a Power BI report using data in the lakehouse

MLOps

So far so good, Fabric made ingestion and visualization super easy. What about Machine Learning? After all, I didn't load the diabetes data just to create some simple charts. I found some "Data Science" functionality in the UI.

No alt text provided for this image
Seems like Fabric has ML functionality as well, I clicked "Model (Preview)"
No alt text provided for this image
Then "Start with a template"
No alt text provided for this image
A notebook with boilerpate code pops up

Aha, there's some actual code! This marks the end of my UI-only experience. I'm not surprised that a more advanced use case like Machine Learning isn't fully captured by a user interface (yet). Also happy that Fabric still lets me do what I like the most: writing code. Another delight: Fabric's live Spark pools really work...no waiting for a pool to start!

I had to change the code a little to load the diabetes data from the lakehouse into Spark, run an experiment, and train and register a model. Microsoft uses the open-source MLFlow framework to manage the ML lifecycle within Fabric. The results can be inspected interactively.

No alt text provided for this image
Fabric integrates MLFlow into its platform

The last step was to load my newly created model and use it to make predictions against the diabetes data.

No alt text provided for this image
Loading and applying the MLFlow model

Conclusion

This was my first interaction with Fabric and it was pretty good. I was able to easily ingest data, visualize it, train a ML model on it, and use that model to make predictions...all within a single tool! I can't say the experience was completely flawless, there are definitely some glitches here and there. But Fabric is still in preview, so I call that acceptable. I don't think general availability is near, but I'm excited to keep track of the developments!

#microsoftfabric #dataengineering #analytics #mlflow #powerbi

Roel Peters

Co-founder of Gatekeeper | Easy hard skill assessments

1 年

Have you encountered and/or tried any enterprise features? E.g. cooperative features, branching and version control, staging environments, ...?

回复

要查看或添加评论,请登录

Jorrit Sandbrink的更多文章

社区洞察

其他会员也浏览了