What is Mirroring in MS Fabric and why do I consider it the next big thing?
Nikola Ilic
I make music from the data??Data Mozart ??| MVP Data Platform | O'Reilly Author | Pluralsight Author | MCT
A few weeks ago, my good friend, Tom Martens (x), asked me what are my top 5 “not-so-obvious” features in Microsoft Fabric. You know, we are not talking about Direct Lake, Lakehouses, notebooks, etc, although I already covered most of these in my introductory Microsoft Fabric article. If you are completely new to Fabric, I encourage you to check this article first.
So, I sent my list to Tom, and then he sent me his picks – aside from the fact that our choices matched 3 out of 5 (“Great minds always think in sync” ??, as Tom likes to say) – the thing that made me smile is that one of the three “matching” features was – Mirroring.
This is something I’ve been eagerly expected since it was announced, so I couldn’t wait to put my hands on it and check how it works in real life. Therefore, this article will try to examine this feature from different angles, which should help you evaluate possible use cases and real-life implementations.
DISCLAIMER: At the moment of writing, this feature is still in public preview, which means that things can change before mirroring becomes generally available
Understanding the context
First things first. Before I show you how to leverage this feature in Microsoft Fabric, let’s first explain the feature itself.
But, before we explain the feature itself, we need to go one step back and examine the key logic behind the Microsoft Fabric workloads, so that you understand the full context of the Mirroring importance.
One of the Fabric’s pillars is OneLake. Explaining OneLake is out of the scope of this article, but you can think of it as of “OneDrive for all your organizational data”. All your organizational data should be stored in one central location (OneLake), and even if your data resides somewhere else (let’s say in ADLS Gen2, GCP, Dataverse, or AWS), you can create OneLake shortcuts and have this data available for processing by all Fabric analytical engines, the same as the data would have been stored physically in OneLake. So, no data copying or movement to Fabric – you can access it via shortcuts, while OneLake manages all permissions and credentials. How cool is that! By the way, shortcuts were also on my Top 5 features list…
I believe you are starting to get the idea:) Microsoft Fabric is being sold as a “unified end-to-end analytics solution” – so, how it could be unified if you have your data stored somewhere else, and then you need to establish complex ETL/ELT processes to bring this data physically to Fabric?
“All right, this is cool, but I have my data stored in different sources, such as Snowflake, Azure SQL DB, or Cosmos DB…As far as I see on your illustration above, shortcuts can’t be created to these data sources, right?”
Unfortunately, you’re right (at least, at this moment)! All these databases (and many others) rely on their proprietary storage formats, and Fabric is all about Delta format. Therefore, it’s not possible to simply create shortcuts to these data sources. So, welcome back to reality and the world of creating ETL/ELT pipelines to make this data available for analytics…
What if I tell you that reality can be much better? So much better that you can have your data from Snowflake or Azure SQL DB available in Fabric in near real-time, without the need to build a single ETL/ELT process?!
Welcome to “mirrored reality”!
Remember the idea of the “unified experience”? That’s exactly what the Mirroring feature ensures for “non-shortcut-supported” data sources! You simply provide connection details of the “mirrored” database, and after the initial snapshot has been created, data will be synchronized in real-time! Whenever someone performs insert/update/delete on the source database, these changes will be automatically propagated to Fabric, hence your users will always have the latest data available for their Fabric workloads!
Before I show how this works in real life, a little more of a theory:) (Thanks to Idris Motiwala from Microsoft team working on Mirroring feature, for the clarification). Mirroring uses a special change feed tech in the background, to enable writing directly to OneLake, instead of creating “changed data” tables in the source database. Therefore, data stored in the database’s proprietary format will be “translated” to Delta format and stored as Delta tables in OneLake.
Once in OneLake, you can do all the things you are doing with “regular” Fabric workloads – query mirrored data, or even write cross-database queries to combine the data from the mirrored database, existing Fabric warehouse, or Fabric lakehouse.
领英推荐
I hear you, I hear you…”What if we want to leverage a Direct Lake mode in Power BI reports? No chance we can include mirrored database data, right?” You couldn’t be more wrong! Don’t forget, mirrored data is now in OneLake in the Delta file format – so, nothing prevents Direct Lake mode from reading this data the same as it would have read Fabric “native” Delta tables! And, it’s not only about Direct Lake – all Fabric capabilities, such as notebooks, for example, can be leveraged over mirrored data.
Mirroring in action
Let’s now dive deep and examine how Mirroring works in Microsoft Fabric.
First, Mirroring must be enabled within your Fabric tenant:
Once in the Data Warehouse experience in MS Fabric, you can choose between creating Mirrored database for Azure SQL DB, Snowflake, and CosmosDB:
Once you select desired database for replication, there are certain prerequisites to complete before mirrored data shows within your Fabric tenant. I won’t go into details explaining how to set everything up, since there is a great step-by-step tutorial on Microsoft Learn. Also, my fellow MVP, Gilbert Quevauvilliers created a fantastic overview of how to quickly get up and running with mirrored Azure SQL DB, which you can check here.
Things to keep in mind…
Let me briefly introduce some of the (in my opinion) key facts when it comes to mirroring:
What about SQL Server?!
I know that most of us are interested to hear if mirroring supports SQL Server as a data source…I got this question at least 10 times during my Ask the experts sessions at the Microsoft Fabric Community conference in Vegas this March. As of today, SQL Server is NOT supported for mirroring, but according to the official Fabric release map, it’s planned for this year! So, fingers crossed that we can soon have our SQL Server on-prem data easily replicated in Fabric.
Conclusion
In my opinion, mirroring was a missing puzzle in the “One Copy for All Data” mantra and I’m sure that it will immensely help in Fabric adoption across organizations that have already implemented and developed mature data platform solutions. Once Mirroring becomes available for on-prem data sources, such as SQL Server or Oracle, Microsoft Fabric will become a no-brainer choice for organizations looking to modernize their data estate.
Thanks for reading!
Data Engineering | Business Intelligence | Data Platform
5 个月Great article! I have seen quite a bit of difference in performance using mirrored tables (with a shortcut to lakehouse) compared to actual managed Delta tables in my lakehouse. The diagnostic output from the notebook that runs transformations using multiple of the mirrored tables suggests optimising the tables because of many small delta files. But from what I can find online or in the documentation, nothing is mentioned about table maintenance. Have you had any experience with this?
Senior Technical Specialist at CitiusTech
8 个月Hi Nikola Ilic, currently Fabric detects my SQL server on VM as an Azure SQL instance and then says that currently mirroring from on-prem SQL isn't supported. If we were to use SQL server on VM, why does it first treat it as Azure SQL and then give the above error? Thoughts?
BI Analyst
9 个月Does the DP-600 exam include questions about database Mirroring in Fabric? Thank you for your support and guidance!
Data Engineering Group Manager at Avanade
10 个月Big fan of mirroring as well. Nikola Ilic are you aware if anyone has tested and published more detailed stats between mirrored data sources? What is the expected/SLA latency for NRT for the data sources currently supported? I'm curious about what is planned for how governance will be managed when it comes to GDPR compliance and similar requirements. I'm also excited to see how Microsoft Purview will complement these features.
Woot woot ????