Oracle Datawarehouse Database Workload Monitoring Using AWR Report
Tamer SHALABY
Technical Director specializing in Oracle DBA and Analytics at Data & Analytics
#ETL #DATAWAREHOUSE #ORACLE #AWR #PERFROMANCE
From site experience, When the load of target (Datawarehouse database) is having issues during ETL (like ODI) loading or read via an analytics tool (OBIEE .. Tableau ... ), it is important to check database statistics that provides information about the type of load on the database and the internal and external resources used by the database .
By default, the database gathers statistics every hour and creates an AWR snapshot,
which is a set of data for a specific time that is used for performance comparisons and AWR retains data for the last eight days.
AWR is a rich source of information that every DBA must have a look in the first place as a start of performance troubleshooting process.
ASH (Active Session History) report puts more focus on session activity. Use it in transient DB performance issues to analyze a session activity .
AWR report contains much data – but contains no concrete recommendations for action.
In the next pages, I have tried to list the most common cases but not all, along with 2 AWR examples . Explaining how to go deeper for finding the root cause .
? Active sessions are sampled and persisted in-memory
– Sampling interval = 1 second
Note:
Before start extracting AWR, make sure that the database wan not shut down .
Have the following initialization parameters (Table) considered :
1. Extraction Methods
Many methods can be used to extract AWR, from which you can select the most convenient for you:
1. SQL scripts found under $ORACLE_HOME/rdbms/admin. Please review the following table for more details about functionalities of each report.
When to generate AWR report:
Generated AWR should cover the timeframe when the issue occurred.
Most Important Sections to Read in AWR :
Instance Efficiency Percentages (Target 100%)
DB Time :
– Total time in database calls by foreground sessions – Includes CPU time, IO time and non-idle wait time
Total DB time = sum of DB time for all active sessions
The target here is to reduce it .
Active session is the session currently spending time in a database call. The average session time is an indication of the current database load .
Average Active Sessions = (DB Time / Elapsed Time )
In the below example :
Average Active Sessions = 508.88 / 59.56 = (8.54 mins) . This database is highly utilized.
A) Report Summary :
Check Instance Efficiency Percentages :
Instance Efficiency Percentages (Target 100%)
-The above figures are in the acceptable margins for the instance efficiency as all the figures are almost 100%
-Another example :
Instance Efficiency Percentages (Target 100%)
Execute to Parse % is low as 38.11 %
Parse CPU to Parse Elapsd % is very low as 2.64 %
The application in this case is not using shareable SQL. In some other cases, the database has sub-optimal parameters that are reducing the effectiveness of cursor sharing. A problem like excessive parsing is likely to manifest itself as additional network traffic between the application server and clients
Parsing is fully CPU Bound Activity. In ideal case Parse CPU should be equal to Parse Elapsed.If the "Parse Elapsed is more, then that indicates the Parsing activity is waiting on some event(eg Acquiring a Latch).This would give how much wall clock time is elapsed for parsing activity has taken to complete the Parse request when it takes.
Low Value for this ratio is an indicator of latching problem. Investigate the Latch Sections in AWR report for contention on library cache and shared pool latches.
Check : Top 5 Timed Foreground Events , section :
A latch is a low-level internal lock used by Oracle to protect memory structures. Latches are similar to short duration locks that protect critical bits of code. This wait indicates that the process is waiting for a latch that is currently busy (held by another process).
Latch free wait here has 9.47 % of the DB total time .
They're accurately referred to as mutual exclusion mechanisms used to protect shared memory structures in the system global area (SGA). Latches are like locks on memory that are very quickly obtained and released. Latches are used to prevent concurrent access to a shared memory structure. If the latch is not available, a latch free miss is recorded.
Most latch problems are related to the failure to use bind variables (library cache latch), redo generation issues (redo allocation latch), buffer cache contention issues (cache buffers LRU chain), and hot blocks in the buffer cache (cache buffers chain).
The latch free event is updated when a server process attempts to get a latch, and the latch is unavailable on the first attempt.
There are also latch waits related to bugs; check MetaLink for bug reports if you suspect this is the case. When latch miss ratios are greater than 0.5 percent, you should investigate the issue.
You need to determine which latch is causing the highest amount of contention.
2-Check ‘Top 5 Timed Events’
Know how much of the database activity is consumed by waiting for a resource such as disk I/O , CPU .
The below database is showing 40.8 % for ‘CPU Time’ which we shall start with in order to know whether this database is CPU bound having serious bottlenecks and about to hang. As every system has its configuration and margins, the percentage is not enough to decide:
Go to "OPERATING SYSTEM STATISTICS" section on the AWR report
Check the following metrics :
BUSY_TIME = 1,402,085
The total amount of time that the CPU was busy for the elapsed time period. This will exceed
the wall-=clock time because we usually have multiple CPU's.
IDLE_TIME = 4,313,829
The amount of time that the database was idle.
NUM_CPUS = 16
The number of processors available to the Oracle database
Available CPU = NUM_CPU's * elapsed_time_bet_snapshots * 60(secs)
= 16 * 60 * 60 = 57,600 seconds
The used DB CPU (12,469 seconds) / The total available CPU (57,600 seconds) = 21.64 % , which is in a very acceptable margin.
Conclusion :
This CPU wait event is not considered as a bottleneck .
-Check Rest of Top 5 wait events are due user I/O activity :
In the above example : ‘db file sequential read’ and ‘db file scattered read’ which are similar , except that the session is reading multiple data blocks in the latter , are consuming 4 hours and 1:20 hours respectively which is a high wait time (compared to AWR snapshot 1 hour time).
The session waits while a sequential read from the database is performed.This is related to a non selective index use. Sequential read is an index read followed by table read because it is doing index lookups which tells exactly which block to go to. This event is also used for rebuilding the control file, dumping datafile headers, and getting the database file headers.
Examine the ‘SQL Ordered by Reads’ section of the report, to see if it might be helpful to tune the statements with the highest resource usage
This section reports the contents of the SQL area ordered by the number of reads from the data files and can be used to identify SQL causing IO bottlenecks which consume the following resources.
- CPU time needed to fetch unnecessary data.
- File IO resources to fetch unnecessary data.
- Buffer resources to hold unnecessary data.
- Additional CPU time to process the query once the data is retrieved into the buffer.
SQL ordered by Reads
- Total Disk Reads: 53,411,956
- Captured SQL account for 34.3% of Total
Action:
-Get the SQL full text for the top SQLs from the sections ‘Complete List of SQL Text’
-Investigate the top SQL execution plan , it can be run separately and SQL tuning advisor can be used to provide more actions , this will be discussed in later chapters.