登录查看更多内容

Leveraging sp_WhoIsActive to Pinpoint and Resolve Your Query Memory Waits

Michael Fuller

Senior SQL Server DBA/Architect

发布日期: 2024年5月24日

Picture this:

You have a new beefy server with plenty of CPU and memory, you just installed the latest and greatest edition of SQL Server. You’ve followed the best practices, updated service packs, firmware, drivers, and your disk is optimally configured.

Fast forward to the test deployment phase, everything is going smoothly. The backups and restores have been completed successfully, and the QA testers have been connected to the system without a hitch. As the QA testers start their systems and run the test plans, the system remains stable, even as more users join. However, when you initiate one of the necessary ETL jobs for testing, the system suddenly slows down, processes freeze, and performance drops. The QA testers start to report that the system is freezing and becoming unresponsive.

It’s time to put on your DBA hat and start troubleshooting! The first step is to choose the right tool for the job. You dig through your DBA Toolbox, assuming the most obvious issue is the problem, and pull out the reliable sp_WhoIsActive by Adam Machanic. This is a common occurrence, typically some process must be blocked. So, you run the procedure, but surprisingly, the results show no blocked processes.

However, after further review you notice in your processes [wait_info] column the words RESOURCE_SEMAPHORE wait.

A RESOURCE_SEMAPHORE what????

What Is A Resource Semaphore?

Resource Semaphores are synchronization objects that control access to memory grants in SQL Server. These are managed through gateways. Anytime a query needs to execute, it requests a query workspace memory grant. These are on a first come first serve basis, if enough memory is available, it is granted, and the query executes. If not, the query goes into a queue where it waits with other requests, processing from smallest to largest until enough memory becomes available.

What's crucial to understand about the executing statement is the amount of memory the process is requesting, the cost, and whether it's currently waiting for memory to be granted. It's also important to know what else is competing for memory, so your query can be executed.

If you don’t know where to start, don't worry, sp_WhoIsActive has got your back!

The @get_memory_info parameter is a useful feature that can assist you in monitoring and optimizing the memory workspace usage of your queries. To access these features, simply use the parameter @get_memory_info, and voilà, the new memory counters will be displayed.

To use it, you need to set the parameter to 1 when you run sp_whoisactive, like this:

EXEC sp_WhoIsActive @get_memory_info = 1

This will add four new columns to the output: requested_memory, granted_memory, used_memory, and max_used_memory. These columns show the amount of memory that each query requested, was granted, is currently using, and has used at its peak, respectively. This data can help pinpoint queries that are either using excessive memory or are inefficiently using minimal memory, allowing you to optimize them as needed.

If you delve deeper into the output of sp_WhoIsActive, you’ll discover a new field labeled [memory_info]. By clicking on the XML link, for active queries that require workspace memory, it returns additional information on memory grants, resource semaphores, and the resource governor settings that are impacting the allocation.

Now Back To The Problem At Hand

A quick look at sp_WhoIsActive execution results doesn’t exactly inspire confidence, we are in some serious trouble here.

We have several processes with memory requests, but they all have a grant type of NULL. As you know from earlier this indicates that the process is waiting on a memory grant, but as you can see by the number of requests were all out of query workspace memory. ?

How could this be? I have 128 GB of RAM?

Why is this happening? Did I miss something during setup?

No, it’s by design; SQL Servers default behavior is to allow a maximum memory grant of 25% of the default workload group memory pool, which means you can only run 3 heavy queries max on the default configuration.

Any heavy query thereafter generates a RESOURCE_SEMAPHORE wait type and begins to tally the milliseconds it must wait before it can be granted the memory to execute.

The rationale for this is to prevent a single process from monopolizing all the memory, which would consequently deprive the remaining processes of necessary resources. Which works great most of the time, allowing several queries to run in parallel, and the one or two memory hog processes pretty much go unnoticed.

We reach a critical point, with the default workload group setting, after three substantial memory grants of 25% have been allocated. There's a built-in hard limit here. Queries won't initiate unless the server's available memory grant is 150% of the memory requested by the query. This is to ensure there's enough room for other queries to run concurrently during the execution of this query. Once we reach this threshold, any subsequent large memory requests simply wait, bringing all high-cost queries to a screeching halt. Worse yet, queries will remain on hold until either memory becomes available, or they hit the default memory grant timeout of 24 hours! Yikes!

It’s apparent that by our result set we added several straws that broke the camel’s back.

Looking a Little Deeper

Let's begin by gathering some metrics. I want to identify when this is occurring, and which queries need attention. If you haven't already set up sp_WhoIsActive logging, let's quickly do that.

This was originally posted on https://whoisactive.com/ , then it was enhanced and shared again, below is the third iteration, enjoy ???

DECLARE @PurgeAfterWeeks INT = 4,
        @CollectionTable VARCHAR(500) = 'WhoIsActive_ResultSets',
        @CollectionDatabase sysname = 'DBA',
        @TableDDL VARCHAR(MAX),
	@Schema sysname = 'dbo',
        @SQL NVARCHAR(4000),
        @Params NVARCHAR(500),
	@MinID INT,
        @IndexCreated BIT;
 
SET @CollectionTable = @CollectionDatabase + '.' +  @schema + '.' + @CollectionTable;
 
--create the logging table
IF OBJECT_ID(@CollectionTable) IS NULL
    BEGIN;
        EXEC dbo.sp_WhoIsActive @get_transaction_info = 1,
			          	           @get_full_inner_text = 1,
                                	           @get_outer_command = 1,
				                   @get_memory_info = 1,
                                	           @get_plans = 1,
				                   @get_additional_info = 1,
                                	           @return_schema = 1,
                               	                   @schema = @TableDDL OUTPUT;
        SET @TableDDL = REPLACE(@TableDDL, '<table_name>', @CollectionTable);
		SET @SQL = REPLACE(@TableDDL, ');', ',[ID] INT IDENTITY(1,1) NOT NULL);'); 
        EXEC sys.sp_executesql @SQL
    END;

--create clustered index on ID column
SET @SQL= 'IF NOT EXISTS (SELECT * FROM ' + @CollectionDatabase + '.sys.indexes WHERE object_id = OBJECT_ID(''' + @CollectionTable+ ''') AND name = N''PK_ID'') SET @index_created = 0'

SET @params = N'@index_created bit OUTPUT';

EXEC sys.sp_executesql @SQL, @parameters=@params, @index_created = @IndexCreated OUTPUT 

IF @IndexCreated = 0
    BEGIN;
        SET @SQL = 'CREATE UNIQUE CLUSTERED INDEX PK_ID ON ' +
                             @CollectionTable + '(ID ASC) WITH(FILLFACTOR = 95)';
        EXEC sp_executesql @SQL;
    END;

--Create non-clustered index on collection_time
SET @SQL= 'IF NOT EXISTS (SELECT * FROM ' + @CollectionDatabase + '.sys.indexes WHERE object_id = OBJECT_ID(''' + @CollectionTable+ ''') AND name = N''IDX_collection_time'') SET @index_created = 0';
SET @params = N'@index_created bit OUTPUT';

EXEC sys.sp_executesql @SQL, @params, @index_created = @IndexCreated OUTPUT 

IF @IndexCreated = 0
    BEGIN;
        SET @SQL = 'CREATE NONCLUSTERED INDEX IDX_collection_time ON ' + @CollectionTable + '(collection_time ASC)';
        EXEC sp_executesql @SQL;
    END;

--add this to a SQL job step to collect activity into the logging table
EXEC dbo.sp_WhoIsActive 	
                                @get_transaction_info = 1,
                                @get_full_inner_text = 1,
                                @get_outer_command = 1,
			        @get_memory_info = 1,
                        	@get_plans = 1,
			        @get_additional_info = 1,
                        	@destination_table  = @CollectionTable;
 
--purge older data based on the clustered key
SET @SQL = 'SELECT @Min = ISNULL(MIN(ID), 1) FROM ' + @CollectionTable + ' WHERE collection_time < DATEADD(week, -' + CAST(@PurgeAfterWeeks AS VARCHAR(10))
      + ', GETDATE());';

EXEC sp_executesql @SQL, N'@Min INT OUT', @MinID out

SET @SQL = 'DELETE FROM ' + @CollectionTable + ' WHERE ID < ' + CAST(@MinID AS VARCHAR);

EXEC sp_executesql @SQL;

Now that we have some logging in place, let's identify those queries that are causing us distress.

As you can see below, I am simply searching for queries where the requested grant is not null and the grant is null, sorted by ID in descending order.

SELECT TOP 50 
	RS1.[dd hh:mm:ss.mss], 
	RS1.[session_id], 
	RS1.[wait_info], 
	RS1.[sql_text], 
	RS1.[sql_command], 
	RS1.[login_name], 
	RS1.[query_plan], 
	RS1.[used_memory], 
	RS1.[max_used_memory], 
	RS1.[requested_memory], 
	RS1.[granted_memory], 
	RS1.[host_name], 
	RS1.[database_name], 
	RS1.[program_name], 
	RS1.[additional_info], 
	RS1.[memory_info], 
  (	
  SELECT TOP 1 
		[memory_info] 
	FROM 
		[DBA].[dbo].[WhoIsActive_ResultSets] 
	WHERE RS1.[session_id] = [session_id] AND 
			RS1.[start_time] = [start_time] AND 
			RS1.[login_time] = [login_time] AND 
			[granted_memory] IS NOT NULL AND 
			[memory_info] IS NOT NULL 
	ORDER BY [ID] DESC
	) AS [rsg_memory_info], 
  RS1.[start_time], 
  RS1.[login_time], 
  RS1.[request_id], 
  RS1.[collection_time] 
FROM [DBA].[dbo].[WhoIsActive_ResultSets] RS1  
WHERE RS1.[requested_memory] IS NOT NULL AND 
      RS1.[granted_memory] IS NULL  
ORDER BY [ID] DESC

Results

Let’s review each of these XML element groupings from [memory_info], utilizing our WhoIsActive_ResultSets query from above:

领英推荐

SQL Quality Assurance for Db2 for z/OS

Infotel Corp 2 个月前

PGAdmin4 and psql

Sumit Jadhav 1 年前

Inspecting SQL Server audits

U2U 1 年前

?Memory Grant

This is where we keep all the memory grant information. As you can see our requested memory is about 20 GB, and since SQL honors all memory requests, all requests are granted. Well except when there is no more memory available. Then instead of a grant you see the [wait_time_ms], and your assigned [queue_id].

?<memory_grant>
    <request_time>2024-05-22T10:54:24.853</request_time>
    <wait_time_ms>8187</wait_time_ms>
    <requested_memory_kb>21034000</requested_memory_kb>
    <ideal_memory_kb>52585000</ideal_memory_kb>
    <required_memory_kb>45712</required_memory_kb>
    <queue_id>5</queue_id>
    <wait_order>0</wait_order>
    <is_next_candidate>1</is_next_candidate>
    <dop>8</dop>
    <query_cost>86766.2944</query_cost>
</memory_grant>

Resource Semaphore

This pertains to memory management, the area where our requests are met. Here, we can observe the amount of memory that is available, the memory that has been granted, the number of processes, and whether there are any processes waiting for memory.

?<resource_semaphore>
    <timeout_error_count>0</timeout_error_count>
    <target_memory_kb>84136000</target_memory_kb>
    <max_target_memory_kb>84136000</max_target_memory_kb>
    <total_memory_kb>84136000</total_memory_kb>
    <available_memory_kb>64341920</available_memory_kb>
    <granted_memory_kb>19794080</granted_memory_kb>
    <used_memory_kb>4554512</used_memory_kb>
    <grantee_count>4</grantee_count>
    <waiter_count>7</waiter_count>
</resource_semaphore>

Workload Group

These settings correspond to what you would encounter in the query governor for the default workload group. As can be observed, we have a maximum of 25% set for large queries, and there are no imposed limits on MAXDOP, CPU, or memory grant timeout.

?<workload_group>
    <name>internal</name>
    <request_max_memory_grant_percent>25</request_max_memory_grant_percent>
    <request_max_cpu_time_sec>0</request_max_cpu_time_sec>
    <request_memory_grant_timeout_sec>0</request_memory_grant_timeout_sec>
    <max_dop>0</max_dop>
</workload_group>

?Resource Pool

As you can see, I only have one resource pool created the default. You can think of these as virtual mini-SQL Server instances with a finite set of resources containerized. These are associated with one or more workload groups, for user sessions, which the Resource Governor classifies to the appropriate workload group helping you balance your resources more effectively.

<resource_pool>
    <name>internal</name>
    <min_memory_percent>0</min_memory_percent>
    <max_memory_percent>100</max_memory_percent>
    <min_cpu_percent>0</min_cpu_percent>
    <max_cpu_percent>100</max_cpu_percent>
</resource_pool>?

Now that we have a grasp of the situation, let’s examine the second XML column, [rsg_memory_info]. This is from the same session, but it’s the most recent log entry captured where the memory was granted, allowing us to observe those counters. As evident from the output below, the maximum memory used for the process is about 1.75 GB. This phenomenon is known as memory bloating.

<memory_grant>
    <request_time>2024-05-22T10:54:24.853</request_time>
    <grant_time>2024-05-22T11:24:24.853</grant_time>
    <requested_memory_kb>21034000</requested_memory_kb>
    <granted_memory_kb>21034000</granted_memory_kb>
    <used_memory_kb>1835008</used_memory_kb>
    <max_used_memory_kb>1835008</max_used_memory_kb>
    <ideal_memory_kb>52585000</ideal_memory_kb>
    <required_memory_kb>45712</required_memory_kb>
    <dop>8</dop>
    <query_cost>86766.2944</query_cost>
</memory_grant>

Solutions

What to do to circumvent this scenario? Well that depends, as any good consultant will tell you, we have several factors that will help make up that decision: IT budget, development staff, and the software you’re running to name a few.

But regardless, it really breaks down to five possible solutions:

?Add more memory to accommodate the requests, capped at 128 GB in SQL Server Standard though, and still can only run 3 large memory queries.
Enable Resource Governor: If not there already, upgrade to Enterprise Edition, you can change the default limit of 25% to something more reasonable like 2% or 3%.
Turn on Row Mode Memory Grant Feedback in SQL Server 2019
Add optional memory governor hint via a plan guide or directly to the SQL statement.
Rewrite the query so it does not request such a large memory grant.

Naturally, the optimal solution is always to rewrite the query to reduce its cost. The cost of a query directly influences the memory estimate. If a query has a high cost, it will require a lot of memory. If rewriting the query isn’t feasible or can’t be done immediately, you’re left with only three options.? I say three, because increasing the server memory might provide a temporary fix and some relief, but you’re still constrained to executing three large queries under the default workload group. ?So, you lose anyways, and it cost you a bunch of money for the same result.

Turn on Row Mode Memory Grant Feedback

In SQL Server 2019 and greater; if you’re fortunate enough to have a newer version of SQL Server, you can simply change the compatibility level to 150 and voila, the magic happens. The Row Mode Memory Grant Feedback feature is activated to address issues of excessive memory grants, as well as insufficient memory leading to spills to TempDB.

Here are the benefits you can expect from this memory grant feedback:

???????? If SQL Server detects that the granted memory exceeds the used memory, the memory grant feedback recalculates the memory for the next run, resulting in less memory usage in subsequent runs.

???????? In scenarios related to spill disk issues, memory grant feedback also assists in recalculating the memory and granting the appropriate memory in further runs.

Resource Governor

This is a feature that I’ve come to appreciate greatly. On my larger servers, I typically enable this and adjust the default workload group [request max memory grant percent] to 2 or 3% to start, which is still substantial when you consider servers with ? TB to a TB of RAM. Occasionally, I’ll create a classifier function and categorize these into workload? groups. This allows me to modify the pool and workload thresholds based on the session load characteristics.

Memory Hints

This is the solution I am going to focus on today, adding memory hints to limit the memory requests upper limit. In this case hints are good, and as long as you’re on at least SQL Server 2012 SP3, they are available ??

Since we do not have 2019, we will just mentally transition into the legacy Row Mode Memory Grant Feedback mode and reduce that memory request to a more practical level

To do this we tack on the memory grant limiter hint.

The syntax is as follows:

OPTION (min_grant_percent = 1, max_grant_percent = 10)

Keep in mind here the following query options (min_grant_percent and max_grant_percent) apply to the maximum grant size, not the entire shared workspace memory.

Min_grant_percent is guaranteed to the query.
Max_grant_percent is the maximum limit.

By doing this you’re effectively saying that the query can only request the upper limit of the maximum memory grant available…in this case with Standard Edition that is 10 percent of the maximum grant size, which defaults to 25 percent of the query workspace memory. ?For example, if you had 80 GB of workspace memory, 25% would be 20 GB, 10% of that would be 2 GB.? Which is plenty of space for our bloated query above, given everything stays the same. But that’s never the case, so leave room for growth, and monitor frequently.

If you’re unable to access the code, you can use a SQL plan guide and incorporate the hint into the guide. If you’re unfamiliar with them, I would recommend exploring the numerous tutorials available online that delve into their complexities. It’s important to remember when working with plan guides that they are confined to the context of the database in which they are executed. In simpler terms, a plan guide only influences the queries that are executed within the same database context where the plan guide was created.

In summary, the @get_memory_info parameter of sp_WhoIsActive is a valuable feature that can assist you in pinpointing the source of your resource semaphore waits. I’ve shown several strategies to address and alleviate this situation, with the most effective being to rewrite the query in a more cost-efficient way.

Furthermore, an understanding of the inner workings of memory can help you identify performance issues, such as memory pressure, spills, and waits. By acquiring this knowledge, you can boost the efficiency of your queries, which will subsequently enhance the performance and scalability of your SQL Server.

To learn more about sp_WhoIsActive and its other parameters, you can visit the official website: https://whoisactive.com/ or email Erik Darling with any questions you may have. Happy coding ??

Oscar Fernandez Escalante

Senior DBA Microsoft SQL Server

9 个月

Love this

bharath panneerselvam

MS SQL DBA @Comcast || Engineer-3 || Ex- TCS ||Ex- Mindtree || Ex- Cognizant

9 个月

查看更多评论

要查看或添加评论，请登录

Michael Fuller的更多文章

The Log Blame Game: How CDC Sneakily Points Fingers at Replication

2025年1月14日

The Log Blame Game: How CDC Sneakily Points Fingers at Replication

Today, I ran into an interesting and somewhat puzzling issue with a production database. The transaction log was…

4 条评论
Create Secure AWS Lambda Functions to Connect to SQL Server RDS Using Kerberos Authentication

2025年1月9日

Create Secure AWS Lambda Functions to Connect to SQL Server RDS Using Kerberos Authentication

Securing AWS Lambda functions when connecting to a SQL Server RDS instance via Kerberos Authentication significantly…

2 条评论
AWS Lambda: Security Challenges and Mitigation Strategies

2025年1月4日

AWS Lambda: Security Challenges and Mitigation Strategies

AWS Lambda has rapidly emerged as a leading tool in cloud computing, empowering organizations to execute code in…

1 条评论
Optimizing Data Operations and Access with Microsoft Purview

2024年12月14日

Optimizing Data Operations and Access with Microsoft Purview

In today’s data-driven landscape, managing data efficiently and securely is paramount. Organizations are increasingly…

2 条评论
Optimizing Database Performance: Key Considerations for SSDs, Availability Groups, and Log Shipping

2024年9月13日

Optimizing Database Performance: Key Considerations for SSDs, Availability Groups, and Log Shipping

When managing databases, particularly in environments that utilize SSDs, and Availability Groups (AGs), or Log Shipping…

1 条评论
Automating JIRA Ticket Creation for Job Failures Using CLR Stored Procedures

2024年9月12日

Automating JIRA Ticket Creation for Job Failures Using CLR Stored Procedures

Effective incident management is crucial for maintaining system reliability and minimizing downtime in modern IT…

1 条评论
Configuring AWS SES SMTP Mail with Microsoft SQL Server (2014-2022): A Quick Guide

2024年9月9日

Configuring AWS SES SMTP Mail with Microsoft SQL Server (2014-2022): A Quick Guide

Amazon Simple Email Service (SES) is a cost-effective, flexible, and scalable email service that enables developers to…

1 条评论
Achieving a Cloud-Agnostic Approach with Microsoft Entra

2024年9月8日

Achieving a Cloud-Agnostic Approach with Microsoft Entra

In today's multi-cloud environments, organizations are increasingly seeking solutions that enable seamless integration…
Reap The Benefits of Consistent and Readable SQL Code

2024年9月4日

Reap The Benefits of Consistent and Readable SQL Code

The style, complexity, and readability of SQL code can differ greatly, influenced by the individual preferences and…

2 条评论
Harnessing Python and Gmail API for Email Attachments: A Modern Approach

2024年8月9日

Harnessing Python and Gmail API for Email Attachments: A Modern Approach

In the realm of email automation, the transition from traditional IMAP methods to modern API-based interactions…

See all articles

Leveraging sp_WhoIsActive to Pinpoint and Resolve Your Query Memory Waits

Michael Fuller

Senior SQL Server DBA/Architect

Picture this:

Now Back To The Problem At Hand

Results

领英推荐

?Memory Grant

Resource Semaphore

Workload Group

?Resource Pool

Solutions

Turn on Row Mode Memory Grant Feedback

Resource Governor

Memory Hints

Michael Fuller的更多文章

社区洞察

其他会员也浏览了

Inspecting SQL Server audits

Silent Performance Killers in SQL Server

Migrating Database from DB2/VSE to DB2 LUW

Oracle Wait Events - db file scattered read

Quick procedure to create an Oracle Dataguard

Quick procedure to create an Oracle Dataguard

The Accidental DBA

Is It Safe to Delete SQL Log Files or Not?

Mastering DBA Troubleshooting (Part 2)

Picture this:

Now Back To The Problem At Hand

Results

领英推荐

?Memory Grant

Resource Semaphore

Workload Group

?Resource Pool

Solutions

Turn on Row Mode Memory Grant Feedback

Resource Governor

Memory Hints

Michael Fuller的更多文章

The Log Blame Game: How CDC Sneakily Points Fingers at Replication

Create Secure AWS Lambda Functions to Connect to SQL Server RDS Using Kerberos Authentication

AWS Lambda: Security Challenges and Mitigation Strategies

Optimizing Data Operations and Access with Microsoft Purview

Optimizing Database Performance: Key Considerations for SSDs, Availability Groups, and Log Shipping

Automating JIRA Ticket Creation for Job Failures Using CLR Stored Procedures

Configuring AWS SES SMTP Mail with Microsoft SQL Server (2014-2022): A Quick Guide

Achieving a Cloud-Agnostic Approach with Microsoft Entra

Reap The Benefits of Consistent and Readable SQL Code

Harnessing Python and Gmail API for Email Attachments: A Modern Approach

社区洞察

其他会员也浏览了

Inspecting SQL Server audits

Silent Performance Killers in SQL Server

Migrating Database from DB2/VSE to DB2 LUW

Oracle Wait Events - db file scattered read

Quick procedure to create an Oracle Dataguard

Quick procedure to create an Oracle Dataguard

The Accidental DBA

Is It Safe to Delete SQL Log Files or Not?

Mastering DBA Troubleshooting (Part 2)