Hybrid tables in Power BI: extending beyond time-related scenarios!
Nikola Ilic
I make music from the data??Data Mozart ??| MVP Data Platform | Pluralsight Author | O'Reilly Instructor | MCT
If you regularly follow my blog, you might’ve noticed that I already wrote about the?Hybrid tables feature in Power BI. From my perspective, Hybrid tables are one of the most powerful features as they open up a whole spectrum of possible implementations. In Microsoft’s?official announcement, there is a showcase to demonstrate how to automatically turn on the Hybrid table feature, to keep the “hot” data in?DirectQuery mode?(with assistance from the Incremental Refresh feature), while keeping “cold” historical data in Import mode.
By leveraging this concept, you should get the best of both worlds – blazing fast performance of?VertiPaq columnstore database, while at the same time having the latest data served from DirectQuery partition. I’ll not go into details, explaining again how this works, as I’ll assume that you’ve already read my?previous article on this topic. If not, please refer to it before proceeding further…
Extending Scenario #2
In the aforementioned article, I’ve shown you in scenario #2 how you can tweak the original feature and, instead of having DirectQuery partition for the latest data, use DirectQuery to keep historical data in the original data source (the logic is that historical data is not so frequently queried in the reports), while using Import mode for the “hot” data (assuming that most of the queries target the most recent data).
However, why should we limit ourselves to just separating “hot” and “cold” data?! Being able to create multiple partitions within one single table, and mix the storage modes of these partitions, opens up a whole range of possible implementations in real-life cases.
Imagine a scenario like this: based on different criteria, for example, how much money they spent, how many orders they placed, and so on, you may want to distinguish between different “levels” of customers – some of them are more valuable to your business, and let’s consider them as “VIP” customers. To keep things simple, I’ll place a certain number of customers in the “VIP” category, while all the others will remain in the “Regular”.
VIPs vs Regulars
Now, the idea is to enable a more thorough insight into “VIP” customers – let’s say that you want to react immediately if your VIPs face problems with ordering/payments, while you keep regular customers in the queue. Up until Hybrid tables were introduced, the only viable solution was to use Import mode and then refresh the data as frequently as possible. But, still, there are limitations for dataset refresh, which can make this solution not good enough. The other solution would be to create a separate fact table for VIP customers only while keeping the “regulars” in another fact table – and leveraging the Composite model feature to keep the VIPs fact table in DirectQuery mode, the regular fact table in Import mode, and dimensions in Dual mode. This solution, however, brings other caveats and potential downsides…
With Hybrid tables, we can create two partitions within one same fact table – the “regulars” will stay in Import mode, while the VIPs partition will be in DirectQuery mode, thus enabling business analysts to have real-time insight into VIP data!
Let me quickly show you how it’s done!
Setting the stage
Here is the original query that will return all the customers that don’t belong to the VIP class:
SELECT o.*
,c.category
FROM Sales.Orders o
INNER JOIN Sales.Customers c ON c.custid = o.custid
WHERE c.category <> 'VIP'
I’ll use that query to Import the data into Power BI. Once I find myself in the Power BI Desktop, I’m going to create two measures – the first will calculate the total number of orders placed by regular customers…
Regular Orders = CALCULATE(
COUNT(Orders[orderid]),
Orders[category] <> "VIP"
)
…while the other will calculate the total number of orders of our VIPs:
VIP Orders = CALCULATE(
COUNT(Orders[orderid]),
Orders[category] = "VIP"
)
Notice that the only distinction is in the logical operator used in the filter of the CALCULATE function. I’ll also add the Customer dimension to my data model, to enable report users to drill down on customers’ data (find the phone number, email address, etc.). This dimension table should be in Dual mode, so it can serve both DirectQuery and Import partitions:
If you switch back to your report, you may be surprised to see that there are no VIP orders displayed, even though you know that they exist in the source database! Don’t worry, we’ll fix this, we’re just warming up:)
It’s completely understandable that no VIP orders are in there – don’t forget that we used the SQL query for the dataset, that excluded all the orders of the VIP customers.
Let’s go and publish this report to Power BI Service:
There is our newly published dataset (by the way, I like the refreshed date, haha). And, let’s check how the report itself looks:
So, we can confirm that it looks exactly the same as in Desktop (no VIP orders). Now, it’s time to do some magic:)
Let’s do some magic…
As I already explained in the previous article, you can’t rely on Power BI Desktop for creating custom partitions (by default, every table in the data model has only one partition). We need to use external tools to manipulate TOM (Tabular Object Model) via the XMLA endpoint. As usual, I’ll use?Tabular Editor, which you should use for Power BI development nevertheless of this specific case.
领英推荐
Let me stop here for the moment and explain the steps from the illustration above. Under the File tab in Tabular Editor, I’ll choose Open, then from DB, and then I’ll choose my Hybrid Customers dataset. The key thing for Hybrid tables to work (this is more of a reminder to my future self) is to change the compatibility level of the tabular model from 1550 to 1565). Click on Model in TOM Explorer and under Database properties set the compatibility level to 1565! Without changing it, you’ll be getting an error.
I’ll then rename the default partition to: Regular Orders – Import:
You may also see the source query used to create our default partition. Now comes the key part of designing our hybrid table solution:
Let’s go and explain step-by-step what we’ve just done:
Once I saved the changes, let’s go again to our report and refresh it:
Yes! We can now see our VIP orders in the report! But, not just that: we should be able to see the changes in real time!!! I’ll mimic the query that will insert the data for VIP orders every 5 seconds in the Orders table, while regular customers’ orders will be inserted every 12 seconds:
WHILE 1=1
BEGIN
WAITFOR DELAY '00:00:05' -- Wait 5 seconds
INSERT INTO sales.Orders
(
custid,
empid,
orderdate,
requireddate,
shippeddate,
shipperid,
freight,
shipname,
shipaddress,
shipcity,
shipregion,
shippostalcode,
shipcountry
)
VALUES
( 7, -- custid - int VIP customer
1, -- empid - int
GETDATE(), -- orderdate - datetime
GETDATE(), -- requireddate - datetime
NULL, -- shippeddate - datetime
1, -- shipperid - int
DEFAULT, -- freight - money
N'', -- shipname - nvarchar(40)
N'', -- shipaddress - nvarchar(60)
N'', -- shipcity - nvarchar(15)
NULL, -- shipregion - nvarchar(15)
NULL, -- shippostalcode - nvarchar(10)
N'' -- shipcountry - nvarchar(15)
)
END
WHILE 1=1
BEGIN
WAITFOR DELAY '00:00:12' -- Wait 12 seconds
INSERT INTO sales.Orders
(
custid,
empid,
orderdate,
requireddate,
shippeddate,
shipperid,
freight,
shipname,
shipaddress,
shipcity,
shipregion,
shippostalcode,
shipcountry
)
VALUES
( 10, -- custid - int Regular Customer
1, -- empid - int
GETDATE(), -- orderdate - datetime
GETDATE(), -- requireddate - datetime
NULL, -- shippeddate - datetime
1, -- shipperid - int
DEFAULT, -- freight - money
N'', -- shipname - nvarchar(40)
N'', -- shipaddress - nvarchar(60)
N'', -- shipcity - nvarchar(15)
NULL, -- shipregion - nvarchar(15)
NULL, -- shippostalcode - nvarchar(10)
N'' -- shipcountry - nvarchar(15)
)
END
And, look what happens with our report:
As you may see, we have data for our VIP orders updated in real-time! That’s AMAZING!
Look behind the scenes
The last thing I wanted to test is what happens if I have a slicer for customer category, and select to see only orders from regular customers, which should be served from the Import mode partition:
Let’s turn on SQL Server Profiler (hint: insert dataset name in the Options window) and capture the queries generated by the Power BI:
As you may notice, DirectQuery queries are still being generated, despite only data from Import mode being needed in the report. It runs quite fast though, but my dataset is very small, to be honest – so, not sure how it would perform on larger datasets.
Conclusion
I’ll repeat again – hybrid tables are one of the most powerful features in Power BI, especially when it comes to extending standard data modeling capabilities.
It’s not just about keeping “hot” and “cold” data separately within the same table – hybrid tables basically give you infinite possibilities to split more important data (the part that needs to be managed and analyzed in real-time) from the less important one, or maybe it’s better to say – the part of data that doesn’t need to be analyzed in near real-time.
As hybrid tables are still in public preview, I expect that this feature will be even more polished once it becomes generally available (hopefully DirectQuery queries will not be generated if only Import partition data is needed).
Special thanks to?Krystian Sakowski?from Microsoft for the inspiration for this article.
Thanks for reading!
This is a valid business use case -- very interesting... Can we have ( more than 1 partitions either in direct query ) ? If yes , there are lot of additional real-time value adds we can think of based on the business needs..
Business Analyst
1 年Great one!
Abhishek Lad Kinisha Shah Srinivasan Ravishankar - FYI
Identification and Valuation Manager (CEE) /PowerBI Developer Freelancer
1 年always great reading Nikola, please keep it up ??
Power BI Training and Consulting
1 年Very cool use case!