SQL Quarantine Behaviour When the Same SQL Executes in Different Resource Manager Consumer Groups With Different CPU Time Limits
David Kurtz
Oracle Database Performance Consultant @ Go-Faster Consultancy (London & Dublin)
You can read the full article on the /*+Go-Faster*/ Blog
What happens if I have two consumer groups with different CPU time limits, but the same long-running SQLs can be run in either group?
There is a use case for this question.? In my sample resource manager plan for PeopleSoft, there are consumer groups with different maximum CPU time limits.? 4 hours for scheduled queries, and 2 hours for SQL*Plus/SQL Developer sessions.??
The tests in this article are based on the examples in Tim Hall's article about SQL Quarantine in 19c.
N.B. SQL Quarantine is an Exadata-only feature.
I have created a simple plan with two consumer groups that have CPU time-outs of 30 and 60 seconds.? There are no group mappings; I will switch groups manually.
BEGIN
DBMS_RESOURCE_MANAGER.CLEAR_PENDING_AREA;
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA;
DBMS_RESOURCE_MANAGER.CREATE_PLAN('GFC_TEST_PLAN');
DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP('LOW_LIMITED30_GROUP');
DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP('LOW_LIMITED60_GROUP');
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'GFC_TEST_PLAN', 'LOW_LIMITED30_GROUP',
mgmt_p8 => 1,
switch_group => 'CANCEL_SQL',
switch_time => 30,
switch_for_call => TRUE);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'GFC_TEST_PLAN', 'LOW_LIMITED60_GROUP',
mgmt_p8 => 1,
switch_group => 'CANCEL_SQL',
switch_time => 60,
switch_for_call => TRUE);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'GFC_TEST_PLAN', 'OTHER_GROUPS');
DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA;
DBMS_RESOURCE_MANAGER_PRIVS.grant_switch_consumer_group(
grantee_name => user,
consumer_group => 'LOW_LIMITED30_GROUP',
grant_option => FALSE);
DBMS_RESOURCE_MANAGER_PRIVS.grant_switch_consumer_group(
grantee_name => user,
consumer_group => 'LOW_LIMITED60_GROUP',
grant_option => FALSE);
END;
/
ALTER SYSTEM SET resource_manager_plan = 'GFC_TEST_PLAN';
I am going to create a PL/SQL function that will burn CPU as it goes around a loop.? It is just like Tim's, except that I specify the number of seconds for which I want mine to run, and it will return the number of times it has looped.
set serveroutput on timi on
CREATE OR REPLACE FUNCTION burn_cpu (p_secs IN NUMBER)
RETURN NUMBER
AS
l_start_time DATE;
l_number NUMBER := 1;
BEGIN
l_start_time := SYSDATE;
LOOP
EXIT WHEN SYSDATE - l_start_time > (p_secs/86400);
l_number := l_number + 1;
END LOOP;
RETURN l_number;
END;
/
I will start by switching to the consumer group LOW_LIMITED30_GROUP
DECLARE
l_session v$session%ROWTYPE;
l_consumer_group VARCHAR2(30):= 'LOW_LIMITED30_GROUP';
--l_consumer_group VARCHAR2(30):= 'LOW_LIMITED60_GROUP';
BEGIN
SELECT * INTO l_session FROM v$session WHERE sid = Sys_Context('USERENV', 'SID');
dbms_output.put_line(l_session.sid||'.'||l_session.serial#||':'||l_session.module||':'||l_session.action||':'||l_session.resource_Consumer_Group);
DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS(l_session.sid,l_session.serial#,consumer_group=>l_consumer_group);
SELECT * INTO l_session FROM v$session WHERE sid = Sys_Context('USERENV', 'SID');
dbms_output.put_line(l_session.sid||'.'||l_session.serial#||':'||l_session.module||':'||l_session.action||':'||l_session.resource_Consumer_Group);
END;
/
I have patch 30104721 installed on 19c to backport the new parameters in 23c, so I need to enable quarantine capture (see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine).
ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_QUARANTINE = TRUE;
ALTER SESSION SET OPTIMIZER_USE_SQL_QUARANTINE = TRUE;
I can run queries that usually run for 45 and 75 seconds, but they will be stopped when they have consumed 30 seconds of CPU time.
SELECT burn_cpu (45) FROM dual
Error report -
ORA-12777: A non-continuable error encountered. Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted
Elapsed: 00:00:33.574
SELECT burn_cpu (75) FROM dual
Error report -
ORA-12777: A non-continuable error encountered. Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted
Elapsed: 00:00:34.023
After a short period, I have 2 quarantine directives
select signature, name, sql_text, plan_hash_value, cpu_time, origin
from dba_sql_quarantine
where sql_text like '%burn_cpu%'
SIGNATURE NAME SQL_TEXT PLAN_HASH_VALUE CPU_TIME ORIGIN
---------------------- ---------------------------------------- ------------------------------ --------------- ---------- ----------------
12281544607895693562 SQL_QUARANTINE_anw6gs0gbta7u125daea2 SELECT burn_cpu (75) FROM dual 308129442 30 RESOURCE-MANAGER
3516266497211383477 SQL_QUARANTINE_31m29rz6y8gpp125daea2 SELECT burn_cpu (45) FROM dual 308129442 30 RESOURCE-MANAGER
Now, the statements are prevented from running immediately.
SELECT burn_cpu (45) FROM dual
Error report -
ORA-56955: quarantined plan used
Elapsed: 00:00:00.414
SELECT burn_cpu (75) FROM dual
Error report -
ORA-56955: quarantined plan used
Elapsed: 00:00:00.416
Next, I will switch the consumer group to LOW_LIMITED60_GROUP.? The CPU limit is now 60 seconds.? Now, neither command returns ORA-56955.
The 45-second process runs to a successful conclusion.
SELECT burn_cpu (45) FROM dual;
BURN_CPU(45)
------------
63264226
Elapsed: 00:00:45.846
The 75-second process runs for 65 seconds and is then terminated with ORA-00040 because it has reached the time limit, but not because the execution plan was quarantined.
SELECT burn_cpu (75) FROM dual
Error report -
ORA-12777: A non-continuable error encountered. Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted
Elapsed: 00:01:05.952
A quarantine directive with a lower CPU_TIME limit than that of the current consumer group is not applied because the statement may run to completion in a time between the lower limit and the higher limit.? Oracle allows the query to execute; it will be aborted when it reaches the higher CPU time.??
After a while, as the documentation indicates, quarantine capture will update the CPU time limit on the existing quarantine definition limit to the higher limit in the current consumer group.
The query in our example runs for 20 minutes only once, and then never again—unless the resource limit increases or the plan changes. If the limit is increased to 25 minutes, then the Resource Manager permits the statement to run again with the quarantined plan. If the statement runs for 23 minutes, which is below the new threshold, then the Resource Manager removes the plan from quarantine. If the statement runs for 26 minutes, which is above the new threshold, the plan remains in quarantine unless the limit is increased.
However, for the statement that runs successfully (for 45 seconds) in the consumer group with the higher limit (60 seconds), I have not seen the database remove the quarantine directive, even if I remove LOW_LIMITED30_GROUP consumer group from all resource plans and drop it from the database.
SIGNATURE NAME SQL_TEXT PLAN_HASH_VALUE CPU_TIME ORIGIN
------------------- ---------------------------------------- ------------------------------ --------------- ---------- ----------------
12281544607895693562 SQL_QUARANTINE_anw6gs0gbta7u125daea2 SELECT burn_cpu (75) FROM dual 308129442 60 RESOURCE-MANAGER
3516266497211383477 SQL_QUARANTINE_31m29rz6y8gpp125daea2 SELECT burn_cpu (45) FROM dual 308129442 30 RESOURCE-MANAGER
The next time an attempt is made to execute the 75-second statement in either consumer group, it is quarantined as before.
SELECT burn_cpu (75) FROM dual
Error report -
ORA-56955: quarantined plan used
Conclusion
Therefore, it is safe to allow different consumer groups in which the same long-running SQL statements may be executed to have different timeouts.? The lower quarantine timeout will not apply to executions in consumer groups with higher timeouts.? The SQL quarantine directives will evolve to have higher timeouts as required.
Azure Cloud Architect & Oracle19c DBA
2 周I know you know the answer ?? I’m guessing the SQL will sometimes appear to be randomly cancelled. And it will be very difficult to find out why…