Dedicated vs Serverless SQL in Azure Synapse – Which should I use?!
Nikola Ilic
I make music from the data??Data Mozart ??| MVP Data Platform | Pluralsight Author | O'Reilly Instructor | MCT
Your company is planning to adopt (or already adopted) Azure Synapse Analytics, as a?one-stop-shop for all the analytic workloads, and now you’re wondering how to leverage certain parts of this robust platform to best serve your analytic needs? Keep reading and I promise that by the end of this article, you’ll have a clear picture, in which business scenarios to choose one (or more) of Synapse’s pools.
10.000 Feet high overview of the Synapse Pools
In one of my?previous articles, I’ve already explained the Synapse architecture from a high-level perspective.?The core part of the platform are two SQL-based pools:?Dedicated?and?Serverless SQL?pools, while there is also a?Spark?pool for developers familiar with this open source engine mostly used for data engineering and machine learning tasks.
In the meantime, Synapse got a new “analytic buddy” in?Azure Synapse Data Explorer?(which is still in preview at this moment), that enables you to analyze log and telemetry data at scale.
In this series of articles published previously, I’ve already explained the technical aspects of both Dedicated and Serverless SQL pools. Not just that, I’ve also examined how each of these pools can be used in synergy with Power BI to enable a seamless reporting experience over large amounts of data. So, I strongly encourage you to walk through?these articles?if you want to understand how Synapse SQL pools work behind the scenes, especially in conjunction with Power BI.
Finally, in this series for mastering the DP-500 exam, I’ve also written articles on how you can quickly turn data into insight, by?creating visualizations directly within the Synapse Studio?– a web-based interface used for working with Synapse pools!
So, the main idea of this article is?NOT?to provide you with the technical aspects of Dedicated and Serverless SQL – instead, the focus will be on?understanding in which business scenarios to use each of them.
However, before we dive into use cases for Synapse SQL pools, let’s try to understand the key differences between these two options:
Obviously, this is a simplified comparison and you should definitely check the?official docs?for a more detailed overview of the features that are currently (not) supported in each of the pools. You should also check the?blog and YouTube channel of my friend Andy Cutler, who shares fantastic resources about Azure Synapse Analytics.
Ok, let’s now focus on various real-life scenarios and consider which Synapse pool to choose for each of them…
#1 Quick data exploration of the data in ADLS
Answer:?Serverless SQL pool?– If you want to get a quick insight into the data stored within the ADLS, Serverless SQL enables you to do just that – even more, by writing a familiar T-SQL code! This way, you can avoid time-consuming ETL processes and reduce time-to-analysis
#2 Quick data exploration of the data stored in Dataverse or Cosmos DB
Answer:?Serverless SQL pool?– If you need to analyze the data stored in either Microsoft Dataverse or Azure Cosmos DB (analytical storage in Cosmos DB), you can query this data directly using Azure Synapse Link. On the flip side, to be able to query this data using a Dedicated SQL pool, you have to load it in the first place
#3 Querying data stored in Delta Lake format
Answer:?Serverless SQL pool?– Delta Lake format is still not supported in Dedicated pool, while you can read these files using Serverless
领英推荐
#4 Creating persistent database objects
Answer:?Dedicated SQL pool?– Since a Serverless pool comes with no dedicated storage, you can’t create persistent database objects, such as tables or materialized views. Serverless supports creating only metadata objects (for example, views or external tables, where the data resides outside of Synapse). Additionally, creating temporary tables, albeit possible, comes with many limitations compared to “traditional” temporary tables that you maybe know from SQL Server or similar RDBMSs
#5 Many concurrent users querying the data
Answer:?Dedicated SQL pool?– As we’ve already learned, in a Serverless SQL pool, you’re paying for the amount of data processed. So, if your workload assumes that there are many concurrent users querying the data, you should be better off storing that data in the Dedicated pool and paying only predictable costs
#6 DirectQuery requirements in Power BI
Answer:?Dedicated SQL pool?– Let’s say that one of the business requirements when planning your Power BI architecture is using DirectQuery storage mode. You can read more about the?DirectQuery storage mode in this article, and in which scenarios it makes sense to go this way. Since Power BI will generate a separate query for every single visual on the report page, it can easily happen that there are dozens of queries targeting the underlying data source. If you recall that Serverless SQL is a typical PAY-PER-QUERY model (meaning, more queries, more money out of the pocket), it’s quite obvious why you should stick with a Dedicated pool in this scenario
#7 Performance
Answer:?It depends?– This one is the most complex consideration, so obviously there is no single correct answer. Depending on the various factors, the same query may perform better on the Dedicated or Serverless pool respectively. So, let’s try to demystify some of the most common performance considerations (again, please keep in mind that this is just a general and simplified overview, and reliable conclusions can be made only after thorough evaluation and testing) for each of the pools:
Dedicated SQL pool performance enhancements
Serverless SQL pool performance enhancements
Conclusion
Of course, these are all just recommended practices. In other words, it doesn’t necessarily mean that you should blindly follow any of these recommendations without taking into account additional factors.
A decision on which Synapse pool to use in a certain real-life scenario is usually dependent on the combination of the multiple consideration points we elaborated on in this article. In the end, it’s up to you as an enterprise data analyst or architect to evaluate the significance of each of these items and, depending on the overall conclusion (performance vs cost-efficiency) determine the best value-for-money implementation strategy.
Thanks for reading!
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
1 年Thank you for Posting.