Leveraging sp_WhoIsActive to Pinpoint and Resolve Your Query Memory Waits
Picture this:
You have a new beefy server with plenty of CPU and memory, you just installed the latest and greatest edition of SQL Server. You’ve followed the best practices, updated service packs, firmware, drivers, and your disk is optimally configured.
Fast forward to the test deployment phase, everything is going smoothly. The backups and restores have been completed successfully, and the QA testers have been connected to the system without a hitch. As the QA testers start their systems and run the test plans, the system remains stable, even as more users join. However, when you initiate one of the necessary ETL jobs for testing, the system suddenly slows down, processes freeze, and performance drops. The QA testers start to report that the system is freezing and becoming unresponsive.
It’s time to put on your DBA hat and start troubleshooting! The first step is to choose the right tool for the job. You dig through your DBA Toolbox, assuming the most obvious issue is the problem, and pull out the reliable sp_WhoIsActive by Adam Machanic. This is a common occurrence, typically some process must be blocked. So, you run the procedure, but surprisingly, the results show no blocked processes.
However, after further review you notice in your processes [wait_info] column the words RESOURCE_SEMAPHORE wait.
A RESOURCE_SEMAPHORE what????
What Is A Resource Semaphore?
Resource Semaphores are synchronization objects that control access to memory grants in SQL Server. These are managed through gateways. Anytime a query needs to execute, it requests a query workspace memory grant. These are on a first come first serve basis, if enough memory is available, it is granted, and the query executes. If not, the query goes into a queue where it waits with other requests, processing from smallest to largest until enough memory becomes available.
What's crucial to understand about the executing statement is the amount of memory the process is requesting, the cost, and whether it's currently waiting for memory to be granted. It's also important to know what else is competing for memory, so your query can be executed.
If you don’t know where to start, don't worry, sp_WhoIsActive has got your back!
The @get_memory_info parameter is a useful feature that can assist you in monitoring and optimizing the memory workspace usage of your queries. To access these features, simply use the parameter @get_memory_info, and voilà, the new memory counters will be displayed.
To use it, you need to set the parameter to 1 when you run sp_whoisactive, like this:
EXEC sp_WhoIsActive @get_memory_info = 1
This will add four new columns to the output: requested_memory, granted_memory, used_memory, and max_used_memory. These columns show the amount of memory that each query requested, was granted, is currently using, and has used at its peak, respectively. This data can help pinpoint queries that are either using excessive memory or are inefficiently using minimal memory, allowing you to optimize them as needed.
If you delve deeper into the output of sp_WhoIsActive, you’ll discover a new field labeled [memory_info]. By clicking on the XML link, for active queries that require workspace memory, it returns additional information on memory grants, resource semaphores, and the resource governor settings that are impacting the allocation.
Now Back To The Problem At Hand
A quick look at sp_WhoIsActive execution results doesn’t exactly inspire confidence, we are in some serious trouble here.
We have several processes with memory requests, but they all have a grant type of NULL. As you know from earlier this indicates that the process is waiting on a memory grant, but as you can see by the number of requests were all out of query workspace memory. ?
How could this be? I have 128 GB of RAM?
Why is this happening? Did I miss something during setup?
No, it’s by design; SQL Servers default behavior is to allow a maximum memory grant of 25% of the default workload group memory pool, which means you can only run 3 heavy queries max on the default configuration.
Any heavy query thereafter generates a RESOURCE_SEMAPHORE wait type and begins to tally the milliseconds it must wait before it can be granted the memory to execute.
The rationale for this is to prevent a single process from monopolizing all the memory, which would consequently deprive the remaining processes of necessary resources. Which works great most of the time, allowing several queries to run in parallel, and the one or two memory hog processes pretty much go unnoticed.
We reach a critical point, with the default workload group setting, after three substantial memory grants of 25% have been allocated. There's a built-in hard limit here. Queries won't initiate unless the server's available memory grant is 150% of the memory requested by the query. This is to ensure there's enough room for other queries to run concurrently during the execution of this query. Once we reach this threshold, any subsequent large memory requests simply wait, bringing all high-cost queries to a screeching halt. Worse yet, queries will remain on hold until either memory becomes available, or they hit the default memory grant timeout of 24 hours! Yikes!
It’s apparent that by our result set we added several straws that broke the camel’s back.
Looking a Little Deeper
Let's begin by gathering some metrics. I want to identify when this is occurring, and which queries need attention. If you haven't already set up sp_WhoIsActive logging, let's quickly do that.
This was originally posted on https://whoisactive.com/ , then it was enhanced and shared again, below is the third iteration, enjoy ???
DECLARE @PurgeAfterWeeks INT = 4,
@CollectionTable VARCHAR(500) = 'WhoIsActive_ResultSets',
@CollectionDatabase sysname = 'DBA',
@TableDDL VARCHAR(MAX),
@Schema sysname = 'dbo',
@SQL NVARCHAR(4000),
@Params NVARCHAR(500),
@MinID INT,
@IndexCreated BIT;
SET @CollectionTable = @CollectionDatabase + '.' + @schema + '.' + @CollectionTable;
--create the logging table
IF OBJECT_ID(@CollectionTable) IS NULL
BEGIN;
EXEC dbo.sp_WhoIsActive @get_transaction_info = 1,
@get_full_inner_text = 1,
@get_outer_command = 1,
@get_memory_info = 1,
@get_plans = 1,
@get_additional_info = 1,
@return_schema = 1,
@schema = @TableDDL OUTPUT;
SET @TableDDL = REPLACE(@TableDDL, '<table_name>', @CollectionTable);
SET @SQL = REPLACE(@TableDDL, ');', ',[ID] INT IDENTITY(1,1) NOT NULL);');
EXEC sys.sp_executesql @SQL
END;
--create clustered index on ID column
SET @SQL= 'IF NOT EXISTS (SELECT * FROM ' + @CollectionDatabase + '.sys.indexes WHERE object_id = OBJECT_ID(''' + @CollectionTable+ ''') AND name = N''PK_ID'') SET @index_created = 0'
SET @params = N'@index_created bit OUTPUT';
EXEC sys.sp_executesql @SQL, @parameters=@params, @index_created = @IndexCreated OUTPUT
IF @IndexCreated = 0
BEGIN;
SET @SQL = 'CREATE UNIQUE CLUSTERED INDEX PK_ID ON ' +
@CollectionTable + '(ID ASC) WITH(FILLFACTOR = 95)';
EXEC sp_executesql @SQL;
END;
--Create non-clustered index on collection_time
SET @SQL= 'IF NOT EXISTS (SELECT * FROM ' + @CollectionDatabase + '.sys.indexes WHERE object_id = OBJECT_ID(''' + @CollectionTable+ ''') AND name = N''IDX_collection_time'') SET @index_created = 0';
SET @params = N'@index_created bit OUTPUT';
EXEC sys.sp_executesql @SQL, @params, @index_created = @IndexCreated OUTPUT
IF @IndexCreated = 0
BEGIN;
SET @SQL = 'CREATE NONCLUSTERED INDEX IDX_collection_time ON ' + @CollectionTable + '(collection_time ASC)';
EXEC sp_executesql @SQL;
END;
--add this to a SQL job step to collect activity into the logging table
EXEC dbo.sp_WhoIsActive
@get_transaction_info = 1,
@get_full_inner_text = 1,
@get_outer_command = 1,
@get_memory_info = 1,
@get_plans = 1,
@get_additional_info = 1,
@destination_table = @CollectionTable;
--purge older data based on the clustered key
SET @SQL = 'SELECT @Min = ISNULL(MIN(ID), 1) FROM ' + @CollectionTable + ' WHERE collection_time < DATEADD(week, -' + CAST(@PurgeAfterWeeks AS VARCHAR(10))
+ ', GETDATE());';
EXEC sp_executesql @SQL, N'@Min INT OUT', @MinID out
SET @SQL = 'DELETE FROM ' + @CollectionTable + ' WHERE ID < ' + CAST(@MinID AS VARCHAR);
EXEC sp_executesql @SQL;
Now that we have some logging in place, let's identify those queries that are causing us distress.
As you can see below, I am simply searching for queries where the requested grant is not null and the grant is null, sorted by ID in descending order.
SELECT TOP 50
RS1.[dd hh:mm:ss.mss],
RS1.[session_id],
RS1.[wait_info],
RS1.[sql_text],
RS1.[sql_command],
RS1.[login_name],
RS1.[query_plan],
RS1.[used_memory],
RS1.[max_used_memory],
RS1.[requested_memory],
RS1.[granted_memory],
RS1.[host_name],
RS1.[database_name],
RS1.[program_name],
RS1.[additional_info],
RS1.[memory_info],
(
SELECT TOP 1
[memory_info]
FROM
[DBA].[dbo].[WhoIsActive_ResultSets]
WHERE RS1.[session_id] = [session_id] AND
RS1.[start_time] = [start_time] AND
RS1.[login_time] = [login_time] AND
[granted_memory] IS NOT NULL AND
[memory_info] IS NOT NULL
ORDER BY [ID] DESC
) AS [rsg_memory_info],
RS1.[start_time],
RS1.[login_time],
RS1.[request_id],
RS1.[collection_time]
FROM [DBA].[dbo].[WhoIsActive_ResultSets] RS1
WHERE RS1.[requested_memory] IS NOT NULL AND
RS1.[granted_memory] IS NULL
ORDER BY [ID] DESC
Results
Let’s review each of these XML element groupings from [memory_info], utilizing our WhoIsActive_ResultSets query from above:
?Memory Grant
This is where we keep all the memory grant information. As you can see our requested memory is about 20 GB, and since SQL honors all memory requests, all requests are granted. Well except when there is no more memory available. Then instead of a grant you see the [wait_time_ms], and your assigned [queue_id].
?<memory_grant>
<request_time>2024-05-22T10:54:24.853</request_time>
<wait_time_ms>8187</wait_time_ms>
<requested_memory_kb>21034000</requested_memory_kb>
<ideal_memory_kb>52585000</ideal_memory_kb>
<required_memory_kb>45712</required_memory_kb>
<queue_id>5</queue_id>
<wait_order>0</wait_order>
<is_next_candidate>1</is_next_candidate>
<dop>8</dop>
<query_cost>86766.2944</query_cost>
</memory_grant>
Resource Semaphore
This pertains to memory management, the area where our requests are met. Here, we can observe the amount of memory that is available, the memory that has been granted, the number of processes, and whether there are any processes waiting for memory.
?<resource_semaphore>
<timeout_error_count>0</timeout_error_count>
<target_memory_kb>84136000</target_memory_kb>
<max_target_memory_kb>84136000</max_target_memory_kb>
<total_memory_kb>84136000</total_memory_kb>
<available_memory_kb>64341920</available_memory_kb>
<granted_memory_kb>19794080</granted_memory_kb>
<used_memory_kb>4554512</used_memory_kb>
<grantee_count>4</grantee_count>
<waiter_count>7</waiter_count>
</resource_semaphore>
Workload Group
These settings correspond to what you would encounter in the query governor for the default workload group. As can be observed, we have a maximum of 25% set for large queries, and there are no imposed limits on MAXDOP, CPU, or memory grant timeout.
?<workload_group>
<name>internal</name>
<request_max_memory_grant_percent>25</request_max_memory_grant_percent>
<request_max_cpu_time_sec>0</request_max_cpu_time_sec>
<request_memory_grant_timeout_sec>0</request_memory_grant_timeout_sec>
<max_dop>0</max_dop>
</workload_group>
?Resource Pool
As you can see, I only have one resource pool created the default. You can think of these as virtual mini-SQL Server instances with a finite set of resources containerized. These are associated with one or more workload groups, for user sessions, which the Resource Governor classifies to the appropriate workload group helping you balance your resources more effectively.
<resource_pool>
<name>internal</name>
<min_memory_percent>0</min_memory_percent>
<max_memory_percent>100</max_memory_percent>
<min_cpu_percent>0</min_cpu_percent>
<max_cpu_percent>100</max_cpu_percent>
</resource_pool>?
Now that we have a grasp of the situation, let’s examine the second XML column, [rsg_memory_info]. This is from the same session, but it’s the most recent log entry captured where the memory was granted, allowing us to observe those counters. As evident from the output below, the maximum memory used for the process is about 1.75 GB. This phenomenon is known as memory bloating.
<memory_grant>
<request_time>2024-05-22T10:54:24.853</request_time>
<grant_time>2024-05-22T11:24:24.853</grant_time>
<requested_memory_kb>21034000</requested_memory_kb>
<granted_memory_kb>21034000</granted_memory_kb>
<used_memory_kb>1835008</used_memory_kb>
<max_used_memory_kb>1835008</max_used_memory_kb>
<ideal_memory_kb>52585000</ideal_memory_kb>
<required_memory_kb>45712</required_memory_kb>
<dop>8</dop>
<query_cost>86766.2944</query_cost>
</memory_grant>
Solutions
What to do to circumvent this scenario? Well that depends, as any good consultant will tell you, we have several factors that will help make up that decision: IT budget, development staff, and the software you’re running to name a few.
But regardless, it really breaks down to five possible solutions:
Naturally, the optimal solution is always to rewrite the query to reduce its cost. The cost of a query directly influences the memory estimate. If a query has a high cost, it will require a lot of memory. If rewriting the query isn’t feasible or can’t be done immediately, you’re left with only three options.? I say three, because increasing the server memory might provide a temporary fix and some relief, but you’re still constrained to executing three large queries under the default workload group. ?So, you lose anyways, and it cost you a bunch of money for the same result.
Turn on Row Mode Memory Grant Feedback
In SQL Server 2019 and greater; if you’re fortunate enough to have a newer version of SQL Server, you can simply change the compatibility level to 150 and voila, the magic happens. The Row Mode Memory Grant Feedback feature is activated to address issues of excessive memory grants, as well as insufficient memory leading to spills to TempDB.
Here are the benefits you can expect from this memory grant feedback:
???????? If SQL Server detects that the granted memory exceeds the used memory, the memory grant feedback recalculates the memory for the next run, resulting in less memory usage in subsequent runs.
???????? In scenarios related to spill disk issues, memory grant feedback also assists in recalculating the memory and granting the appropriate memory in further runs.
Resource Governor
This is a feature that I’ve come to appreciate greatly. On my larger servers, I typically enable this and adjust the default workload group [request max memory grant percent] to 2 or 3% to start, which is still substantial when you consider servers with ? TB to a TB of RAM. Occasionally, I’ll create a classifier function and categorize these into workload? groups. This allows me to modify the pool and workload thresholds based on the session load characteristics.
Memory Hints
This is the solution I am going to focus on today, adding memory hints to limit the memory requests upper limit. In this case hints are good, and as long as you’re on at least SQL Server 2012 SP3, they are available ??
Since we do not have 2019, we will just mentally transition into the legacy Row Mode Memory Grant Feedback mode and reduce that memory request to a more practical level
To do this we tack on the memory grant limiter hint.
The syntax is as follows:
OPTION (min_grant_percent = 1, max_grant_percent = 10)
Keep in mind here the following query options (min_grant_percent and max_grant_percent) apply to the maximum grant size, not the entire shared workspace memory.
By doing this you’re effectively saying that the query can only request the upper limit of the maximum memory grant available…in this case with Standard Edition that is 10 percent of the maximum grant size, which defaults to 25 percent of the query workspace memory. ?For example, if you had 80 GB of workspace memory, 25% would be 20 GB, 10% of that would be 2 GB.? Which is plenty of space for our bloated query above, given everything stays the same. But that’s never the case, so leave room for growth, and monitor frequently.
If you’re unable to access the code, you can use a SQL plan guide and incorporate the hint into the guide. If you’re unfamiliar with them, I would recommend exploring the numerous tutorials available online that delve into their complexities. It’s important to remember when working with plan guides that they are confined to the context of the database in which they are executed. In simpler terms, a plan guide only influences the queries that are executed within the same database context where the plan guide was created.
In summary, the @get_memory_info parameter of sp_WhoIsActive is a valuable feature that can assist you in pinpointing the source of your resource semaphore waits. I’ve shown several strategies to address and alleviate this situation, with the most effective being to rewrite the query in a more cost-efficient way.
Furthermore, an understanding of the inner workings of memory can help you identify performance issues, such as memory pressure, spills, and waits. By acquiring this knowledge, you can boost the efficiency of your queries, which will subsequently enhance the performance and scalability of your SQL Server.
To learn more about sp_WhoIsActive and its other parameters, you can visit the official website: https://whoisactive.com/ or email Erik Darling with any questions you may have. Happy coding ??
?
Senior DBA Microsoft SQL Server
9 个月Love this
MS SQL DBA @Comcast || Engineer-3 || Ex- TCS ||Ex- Mindtree || Ex- Cognizant
9 个月??