Troubleshooting performance on serverless Synapse SQL pool using QPI library
Jovan Popovic
Principal Program Manager at Microsoft, working on Microsoft Fabric Warehouse. Worked on Azure Synapse, Azure SQL Azure SQL Managed Instance, and SQL Server.
Query performance insights
Four years ago, I created a Query performance insights library (QPI). This is a T-SQL library that enables troubleshooting and monitoring performance on #AzureSQL #ManagedInstance. While I worked on Managed Instance performance, this set of scripts helped me to troubleshoot various performance issues. I also created separate versions of QPI library for #AzureSQL database and #SQLServer.
Now, I created another QPI version for the serverless SQL pools in #AzureSynapseAnalytics. In this article, I will explain how to use this library to monitor and optimize your workload on serverless SQL pools.
QPI is an open-source library that is not part of the Azure service.
Setup QPI
The first thing that you need to do is to install QPI scripts on your serverless SQL pool database (it cannot be created on master). Follow this link, pick a version for the serverless SQL pools, and execute the script on a database. You might review the script before you execute it to ensure that there are no dangerous actions. It will create the views and functions that are using system/catalog views, and they are not reading your data.
Once you create these scripts, you would be able to use T-SQL views and functions that can help you to:
Analyze the queries
QPI enables you to easily see the queries that are running on your serverless SQL pool using qpi.queries view:
SELECT *
FROM qpi.queries
As a result, you will get a list of the currently executing queries.
Another useful query will return the queries that are executed in the past:
SELECT *
FROM qpi.query_history
This query will return all finished (completed, canceled, and failed) queries that are kept in history. Note that some queries might expire from the history.
Every query has its own request id (or distributed statement id that you might see in the message window when you execute a query). Make sure that you provide this information to Azure support if you are reporting some performance issues with the queries.
领英推荐
If you are running the queries using Synapse Studio, SSMs, ADS, or some other query tool, you can take this distributed statement id from the messages window. However, if you need to report an issue with a query executed from Power BI or some other tool where you cannot see the info messages, find the query execution in this view and include start_time and request_id from this view.
A query_text_id column is another interesting column that can group the "identical" query texts. It is a hash value created on the query text with removed constants/literals. Note that the queries might not be identical, but this column might be useful when you need to find the executions of the same query:
The serverless SQL pool discards the literals in TOP, file paths, and other parts of the queries so you might have the same hash for different query shapes. But this is still the best way to find similar queries.
You can compare the characteristics of two query executions if you provide the request ids to the function qpi.cmp_queries():
This function might be useful if you need to compare two different executions of a query.
Optimize schema
You need to create tables with optimized schemas to get the best performance. There are many rules for schema optimization, such as:
You can find a list of best practices on the Synapse SQL documentation page.
Sometimes, it is hard to inspect all tables or views to apply the best practices. With the QPI library, you can use qpi.recommendations view to get the possible improvements that you might make in your database:
This view inspects your schema and provides recommendations. The score column informs you how important is the change. The recommendations with a score of 1 are the most critical.
Note that these are the best-effort recommendations and some of them might not be applicable to your schema. However, it would be good to review them and ensure that you don't have some hidden issue in your schema that might impact your performance.
Conclusion
Query performance insight is a simple T-SQL library that enables you to analyze query performance and get some recommendations that might optimize your workload. It is a lightweight and free library that can help you if you are using serverless SQL pools to analyze your data.
Note that this is an open-source library that is not a part of Azure services. You can use it if it can help you, and you can report bugs/feature requests on the GitHub project site. However, Azure support will not handle any issue related to this library. Use this library to get the additional information that might help you to optimize your databases or provide additional troubleshooting information to support.
Azure Data Architect, Azure Data Engineer: Lead Consultant, Microsoft Certified Trainer (MCT) and Speaker on Azure Data, AI, Microsoft Fabric, Databricks, Power BI.
2 年Thanks for sharing, Jovan Popovic
Technical Specialist, Architect - Data & AI at Microsoft
3 年Very useful information.. Thanks for sharing Jovan Popovic !! Appreciate the efforts.
Azure DW/BI Analytics Tech. Arch.
3 年Hello Jovan Popovic, We have created sql view in synapse serverless pool to query JSON files over azure blob storage. Is there a feature in synapse serverless to mask sensitive data before displaying to client tools/apps?? Thanks, Sanjay
Empowering Businesses with Scalable IT Solutions | Digital Transformation | Microsoft Technologies | STARLIMS | Nearshore Innovation | Director of Business Development US | LATAM
3 年Thank you for sharing this will definitely be of help