Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit
David Kurtz
Oracle Database Performance Consultant @ Go-Faster Consultancy (London & Dublin)
(you can also read this article on the PeopleSoft DBA Blog)
I have written previously about using the Oracle database resource manager to prioritise the allocation of CPU to different processes in a PeopleSoft system.? I proposed a sample resource plan that can be used as a starting point to build a resource plan that meets a system's specific objectives and requirements.
This post looks at?
Configuring SQL Cancellation in a Resource Manager Consumer Group
The sample resource plan, PSFT_PLAN, contains various server consumer groups.? It relies upon MODULE and ACTION being set by enabling PeopleSoft instrumentation (EnableAEMonitoring=1) and/or the psftapi_store_prcsinstance trigger on PSPRCSRQST (see?Effective PeopleSoft Performance Monitoring),
Recommendations:?
Plan Directives
Plan directives are created with DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE.?
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'PSFT_PLAN', 'PSQUERY_BATCH_GROUP'
,mgmt_p6 => 1
,switch_group => 'CANCEL_SQL'
,switch_time => 14400
,switch_estimate => TRUE
,switch_for_call => TRUE
);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'PSFT_PLAN', 'PSQUERY_ONLINE_GROUP'
,mgmt_p6 => 90
);
DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
'PSFT_PLAN', 'LOW_LIMITED_GROUP'
,mgmt_p8 => 1
,switch_group => 'CANCEL_SQL'
,switch_time => 7200
,switch_elapsed_time => 7200
,switch_estimate => TRUE
,switch_for_call => TRUE
);
Four parameters control cancellation behaviour.
领英推荐
Cancellation Behaviour
The resource manager can cancel long-running queries in these consumer groups raising ORA-00040: active time limit exceeded - call aborted?
Querying the PeopleSoft Message Log
The full message text is stored in multiple pieces in PS_MESSAGE_LOGPARM and must be reconstructed so that it can be searched for the error code.? I demonstrated this technique in another blog post: Querying the PeopleSoft Message Log with SQL.
For this analysis, I have made some alterations to the message log query (see?message_log_checker-psquery.sql).
It is necessary to filter by message number because even in PS/Query users can write invalid SQL that produces other error messages.? However, all this text processing for each row retrieved makes the query quite slow.
Here is an example output.? User USR001 has run a private query MY_TEST2 with run control 42.? It ran for 20772s (5h 46m), until it was terminated by the resource manager.? As explained above, the 4-hour limit is on CPU time that will be less than the elapsed time.
Public/
Private Ru Exec Msg Msg Msg
P.I. DBNAME OPRID RUNCNTLID Query QRYNAME St Secs DTTM_STAMP_SEC Seq Set Nbr
--------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
MSG
-----------------------------------------------------------------------------------------------
12395311 FSPROD USR001 42 Private MY_TEST2 10 20772 10-FEB-25 04.41.47.384694 PM 1 65 30
File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526 Error Position: 189 Return: 40 - ORA-00040: active time
limit exceeded - call abortedFailed SQL stmt: SELECT A.LEDGER, A.FISCAL_YEAR, A.BUSINESS_UNIT, …
10-FEB-25 04.41.47.421800 PM 2 50 380
Error in running query because of SQL Error, Code=40, Message=ORA-00040: active time limit exceeded - call aborted
…
From Oracle 19c on Exadata timed out statements are automatically quarantined.? If a quarantined statement is run and a quarantined execution plan is generated, then error ORA-56955 is generated immediately.? It can therefore also be detected in the logs.? So the query searches for both messages.
Public/
Private Ru Exec Msg Msg Msg
P.I. DBNAME OPRID RUNCNTLID Query QRYNAME St Secs DTTM_STAMP_SEC Seq Set Nbr
--------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
MSG
-----------------------------------------------------------------------------------------------
12319513 FSPROD USR002 Transactions Public GBR_JRNL_LINE_DTL_ACCT 10 25 13-FEB-25 11.13.35.746644 PM 1 65 30
File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526 Error Position: 2783 Return: -8581 - ORA-56955: quarant
ined plan usedFailed SQL stmt: SELECT A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, A.JRNL_HDR_STATUS, B.LED
…
13-FEB-25 11.13.35.814112 PM 2 50 380
Error in running query because of SQL Error, Code=-8581, Message=ORA-56955: quarantined plan used
I will discuss automatic SQL quarantine for PS/Query in a subsequent blog.
Opinion
So far I have explained how to set a maximum CPU time limit for PS/Queries in a resource manager consumer group, and then to detect the cancelled PS/queries by examining the message log.
The final stage is to close the feedback loop and go back to the users who are producing the queries, find out what they are trying to do, and why the queries are running for such a long time.