Application performance: Using EXPLAIN PLAN (Developer and DBA part 2)
Nasser Shateri
Data Engineer | Data Architect | Database Administrator | Oracle RAC & Cloud Solutions | Delivering Scalable Solutions and Enhancing System Performance
In the following sections, I will explain some terms commonly used for tuning SQL statements, with which both DBAs and developers should be familiar. These terms will be elaborated upon further and utilized in the next advanced part of this article.
SQL Parsing
In Oracle SQL, parsing is a critical process that occurs before a query is executed on the database engine. Parsing involves several steps to ensure that the query is correctly formatted, semantically correct, and compliant with access rights. Here's a breakdown of the process:
1. Syntax and Semantic Checks: The first step involves checking the query for correct syntax (the structure of the SQL statement) and semantics (the meaning of the statement). This ensures that the SQL command is valid and makes sense to the database engine.
2. Access Rights Verification: The database verifies that the user or process executing the query has the necessary permissions to perform the requested operations.
3. Storing Parent Cursor in Shared SQL Area: If the query passes syntax, semantic, and access rights checks, a parent cursor is created and stored in the shared SQL area
4. Generating Execution Plan: The database engine generates an execution plan for the query. This plan outlines how the database will retrieve and process the data.
5. Storing Child Cursor in Shared SQL Area: After the execution plan is generated, a child cursor is created and stored in the shared SQL area.
In Oracle SQL parsing, there are two types of parses: soft parse and hard parse.
- Soft Parse: This occurs when shareable parent and child cursors are already available, and only the initial checks (syntax, semantic, and access rights) are required. Soft parses are less resource-intensive as they do not require the generation of a new execution plan.
- Hard Parse: This happens when all the steps, including the generation of the execution plan and storing new cursors, are carried out. Hard parses are more CPU-intensive and require more memory in the shared pool.
From a performance perspective, hard parses should be minimized because:
1. CPU Intensive: Generating an execution plan is a CPU-intensive operation, which can slow down the database performance if done frequently.
2. Memory Allocation in Shared Pool: Storing parent and child cursors in the library cache requires memory allocation in the shared pool, which is a resource shared across all sessions. Memory allocations in the shared pool are serialized, requiring shared pool latches. Frequent hard parses can lead to contention for these latches, affecting overall application performance.
The Clustering Factor
The clustering factor is a measure of the ordered-ness of an index in comparison to the table that it is based on. It is used to check the cost of a table lookup following an index access (multiplying the clustering factor by the selectivity gives the cost of the operation). The clustering factor records the number of blocks that will be read when scanning the index.
If the clustering factor is close to the number of blocks in the table, then the index is well ordered, but if the clustering factor is close to the number of rows in the table, then the index is not well ordered. The Clustering_Factor column in the DBA_INDEXES view gives an indication as to how organized the data is compared to the indexed columns. If the value of the Clustering_Factor column value is close to the number of leaf blocks in the index, the data is well ordered in the table. If the value is not close to the number of leaf blocks in the index, then the data in the table is not well ordered.
The Binary Height
The binary height of an index plays a major role in the amount of I/O that needs to be performed to return the ROWID to the user process. Each level in the binary height adds an extra block that needs to be read, and because the blocks are not being read sequentially, they each require a separate I/O operation. you can query the blevel column of DBA_INDEXES to see its binary height. Having a large number of deleted rows in the index can also cause the height to increase. Rebuilding the index may help to decrease the height.
Histograms
Histograms record the distribution of data when you analyze a table or index. The main reason for generating histograms is to help the optimizer plan properly if the data in a table is heavily skewed. For example, if one or two values make up a large percentage of a table, the related indexes may not help to reduce the number of I/Os required to satisfy the query. The creation of a histogram will let the cost-based optimizer know when using the index is appropriate, or when 80 percent of the table is going to be returned because of the value in the WHERE clause. DBAs can also manually create histograms using the DBMS_STATS package, particularly the GATHER_TABLE_STATS procedure. This is useful in scenarios where specific knowledge of the data might lead to more efficient histogram creation.
Jump in to find out and understand the EXPLAIN PLAN.
Until now, we have learned some basics about what to do to achieve better performance when running queries on a database. Let's dive deeper into how it works and how Oracle produces the results of SQL, and what we can do to achieve better performance. The first step is to find out what the EXPLAIN PLAN of an SQL query is and how to work with that.
An execution plan in Oracle is a crucial element for understanding and optimizing the performance of SQL queries. he EXPLAIN PLAN command displays an execution plan for a SQL statement without executing the statement. It outlines the detailed steps or operations the database will take to execute a SQL statement. These steps are represented as a set of database operators that process rows of data. Let's look at the different methods to display the execution plan and then delve into the details of the EXPLAIN PLAN output and how it can be utilized for performance enhancement.
Displaying the Execution Plan
1. EXPLAIN PLAN Command:
?? - Usage: The EXPLAIN PLAN command is used to display an execution plan for a SQL statement without actually executing the statement.
?? - Output: It shows how Oracle will execute the SQL statement, detailing the path it will take to access data and the operations it will perform.
- Using EXPLAIN PLAN
sql> @?/rdbms/admin/utlxplan.sql
sql> EXPLAIN PLAN FOR
SELECT * FROM employees WHERE department_id = 10;
sql> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
Plan hash value: 123456789
-------------------------------------------------------------------------
| Id? | Operation?????????? | Name?????? | Rows? | Bytes | Cost (%CPU)| Time???? |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT??? |???????? |???? 1 |??? 52 |???? 2?(0)|00:00:01 |
|* 1 | TABLE ACCESS FULL | EMPLOYEES|??? 1 |??? 52 | ?2?(0)|00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
?? 1 - filter("DEPARTMENT_ID"=10)
2. V$SQL_PLAN View:
?? - Usage: This is a dynamic performance view that provides the execution plan for SQL statements that have been executed.
?? - Benefit: It's useful for analyzing the actual execution plans as opposed to the estimated plans provided by the EXPLAIN PLAN command.
3. IDE Tools (TOAD, PL/SQL Developer, etc.):
?? - Functionality: These integrated development environments provide graphical interfaces and additional tools to view and analyze execution plans.
?? - Advantages: They often offer more user-friendly ways to interpret the plans, making it easier to spot performance issues.
4. Oracle Cloud Management:
?? - Application: In cloud-based Oracle databases, you can view SQL plans through cloud management tools.
?? - Convenience: These tools often provide advanced analytics and monitoring capabilities.
Understanding EXPLAIN PLAN Output
In order to determine if you are looking at a good execution plan or not, you need to understand how the Optimizer determined the plan in the first place. You should also be able to look at the execution plan and assess if the Optimizer has made any mistake in its estimations or calculations, leading to a suboptimal plan. The components to assess are:
Cardinality
Cardinality refers to the estimated number of rows that will be returned by each operation in a query. A basic formula is often applied in scenarios involving a single equality predicate within a single table query that doesn't utilize a histogram. In such cases, the Optimizer assumes a uniform distribution of data. It calculates the cardinality by dividing the total number of rows in the table by the number of distinct values in the column specified in the WHERE clause predicate. However, there are several factors that can lead to inaccurate cardinality estimates, even when the basic table and column statistics are up to date. These factors include:
- Data Skew: Significant variations in the distribution of data values.
- Multiple Single Column Predicates on a Single Table: Multiple conditions applied to a single column can complicate cardinality estimation.
- Function-Wrapped Columns in the WHERE Clause Predicates: Use of functions in WHERE clause predicates can affect the estimation process.
- Complex Expressions: Complicated query expressions can challenge the Optimizer's ability to accurately predict cardinality.
To accurately account for data skew, a histogram on the column may be required. Oracle typically determines the need for histograms automatically based on column usage statistics and the presence of data skew. If there's a need to create a histogram manually, the following command can be used:
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('HR', 'EMPLOYEES', method_opt => 'FOR COLUMNS SIZE 254 JOB_ID');
To manually verify whether the Optimizer has estimated the correct cardinality, the GATHER_PLAN_STATISTICS hint can be used in the SQL statement. This hint enables the automatic collection of more comprehensive runtime statistics. In the EXPLAIN plan, the columns 'E-Rows' (Estimated Rows) and 'A-Rows' (Actual Rows) indicate the estimated and actual rows returned, respectively. For example:
?SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ employee_id, last_name, job_id
???? FROM employees
???? WHERE job_id = 'AD_VP';
SQL> SELECT plan_table_output
???? FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT => 'ALLSTATS LAST'));
Access Method
The access method, or access path, indicates how data is accessed from each table or index in an Oracle database. This method is detailed in the 'Operation' field of the EXPLAIN plan. Oracle supports several common access methods:
1. Full Table Scan:
?? - Reads all rows from a table and filters out those not meeting the WHERE clause predicates.
?? - Utilizes multi-block I/O, typically 1MB I/Os.
?? - Chosen when a significant portion of the rows must be accessed, when no usable indexes exist, or if it's the least costly option.
?? - Influenced by factors like the db_multi_block_read_count parameter in init.ora, parallel degree, hints, the absence of usable indexes, and cost comparisons with index usage.
2. Table Access by ROWID:
?? - Accesses a row based on its ROWID, which specifies the data file, data block within that file, and the row's location within that block.
3. Index Unique Scan:
?? - Returns a single row from a unique index scan.
?? - Used when there's an equality predicate on a unique B-tree index or a primary key constraint.
4. Index Range Scan:
?? - Accesses adjacent index entries and uses the ROWID values to retrieve corresponding table rows.
领英推荐
?? - Applied for equality predicates on non-unique index keys, or non-equality/range predicates on unique index keys (like '=', '<', '>', 'LIKE' provided it's not on the leading edge).
5. Index Range Scan Descending:
?? - Similar to Index Range Scan but used for satisfying 'ORDER BY ... DESCENDING' clauses.
6. Index Skip Scan:
?? - Skips the first column of the index and uses the rest, particularly when all other columns except the first are referenced in the query.
7. Full Index Scan:
?? - Processes all leaf blocks of an index and enough branch blocks to find the first leaf block.
?? - Selected when all columns needed for the query are in the index, and it's cheaper than a full table scan.
?? - Situations favoring this include matching 'ORDER BY' clauses, sort merge joins, or queries where the order of referenced columns matches leading index columns.
8. Fast Full Index Scan:
?? - An alternative to a full table scan, used when the index contains all required columns for the query, with at least one column in the index key having a NOT NULL constraint.
9. Index Join:
?? - A join of several indexes on the same table that collectively contain all columns referenced in the query.
?? - Eliminates the need for table access as all relevant column values are retrieved from the joined indexes.
10. Bitmap Index:
?? - Uses a set of bits for each key value and a mapping function to convert bit positions to ROWIDs.
?? - Efficient for merging bitmap indexes corresponding to several predicates in a WHERE clause using Boolean operations.
Join Method
The join method in a SQL query describes how data from two data-producing operations are combined. These methods can be identified in the operations column of the EXPLAIN plan. Common join methods include:
1. Hash Joins:
?? - Used for joining large datasets.
?? - The optimizer builds a hash table in memory using the smaller table or data source based on the join key.
?? - The larger table is then scanned, applying the same hashing algorithm on the join columns.
?? - If matching values are found in the hash table, a row is returned.
2. Nested Loops Joins:
?? - Suitable for joining small subsets of data, especially when there's an efficient way to access the second table, such as with an index lookup.
?? - For each row in the first table (the outer table), Oracle accesses all rows in the second table (the inner table).
?? - This method is same to two embedded FOR loops.
?? - Notably, Oracle Database 11g's implementation of nested loop joins was updated to reduce latency for physical I/O.
3. Sort Merge Joins:
?? - Effective when the join condition between two tables involves inequality operators like '<', '<=', '>', or '>='.
?? - Typically more efficient than nested loop joins for large datasets.
?? - Involves two steps: sorting both inputs on the join key (Sort join operation) and then merging the sorted lists (Merge join operation).
?? - More likely chosen if there's an index on one of the tables that can eliminate one of the sorts.
Join Types
Oracle supports various join types, including:
- Inner Join: The most common type, often implied in queries without the explicit use of the keyword "INNER".
- Outer Join: Returns all rows satisfying the join condition, plus unmatched rows from the specified table.
- Full Outer Join, Anti Join, Semi Join, Grouped Outer Join, etc.
Join Order
The join order in a multi-table SQL statement is essential. It can be deduced from the indentation of tables in the operation column of the execution plan and is influenced by factors like cost, cardinality estimates, and available access paths. The Optimizer adheres to rules such as:
- Joins resulting in at most one row go first, determined by UNIQUE and PRIMARY KEY constraints.
- In outer joins, the row-preserving table follows the table with the outer join operator.
- For antijoins or semijoins derived from subqueries, the tables from the subquery follow the connected tables from the outer query block.
- If view merging isn't feasible, all tables within the view are joined before those outside the view.
Partitioning
Partitioning in databases allows a table, an index, or an index-organized table to be subdivided into smaller, more manageable segments, each known as a Partition. This approach is particularly effective for improving performance, as it enables more efficient data access and management. A key technique associated with partitioning is known as partition pruning or partition elimination.
For instance, consider an application with an ORDERS table that records all orders from the last two years. If this table is partitioned by day, a query requesting orders for a single week would need to access only seven partitions of the ORDERS table, as opposed to all 730 partitions representing the full two years.
The effectiveness of partition pruning can be observed in an execution plan through the PSTART and PSTOP columns. The PSTART column indicates the number of the first partition to be accessed, while the PSTOP column shows the number of the last partition that will be accessed. This selective access to partitions significantly reduces the amount of data that needs to be scanned, thereby improving query performance and resource utilization.
Parallel Execution
Parallel execution in Oracle Database relies on the collaboration between a Query Coordinator (QC) and parallel server processes. The QC, which initiates the parallel SQL statement, delegates tasks to parallel server processes that perform work concurrently. While the QC handles mostly logistical tasks that cannot be executed in parallel, it also performs minimal work like aggregating sub-totals from parallel server processes, as seen in a parallel query using a SUM() operation.
Granules
A granule represents the smallest unit of work assignable to a parallel server process. Oracle Database primarily uses block ranges on disk, known as block-based granules, for work distribution in parallel execution. In the case of partition-based granules, a parallel server process handles all data within a single partition. The Oracle Optimizer opts for partition-based granules when the number of accessed (sub)partitions matches or exceeds the Degree of Parallelism (DOP). The term 'PX PARTITION RANGE ALL' in an execution plan indicates that each parallel server process is dedicated to a specific range partition in the table.
Producers and Consumers
In efficient parallel execution, parallel server processes are organized into producers and consumers. Producers, identifiable by operations under any PX SEND operation, generate rows, while consumers, recognized by PX RECEIVE operations, process these rows.
Data Redistribution
Data redistribution is essential when processing joins in parallel, especially involving large tables like CUSTOMERS and SALES. Producers scan the tables based on block ranges, apply WHERE clause predicates, and send the resulting rows to consumers for join completion. The execution plan's IN-OUT and PQ Distrib columns provide insights into this redistribution process. The PQ Distrib column, in particular, is informative and has largely supplanted the IN-OUT column.
Common redistribution methods, reflected in the PQ Distrib column, include:
In Real Application Clusters (RAC) databases, the suffix 'LOCAL' indicates optimization to minimize interconnect traffic for inter-node parallel queries, distributing rows only to consumers on the same RAC node.
Notably, the IN-OUT column values 'P->P' and 'P->S' indicate data transfer from one parallel operation to another and from a parallel to a serial operation, respectively.
For the next part in our SQL tuning series, we're diving deep into some very advanced topics. Each one will be explored with practical examples to illuminate the concepts. Stay tuned for these insightful deep dives.
Feel free to ask questions or leave comments to spark a discussion about the article. Your input is invaluable as we explore these complex topics together. Stay connected for the upcoming advanced insights!"
References
Troubleshooting Oracle Performance second edition by Christian Antognini.
Oracle Database Performance Tuning Tips and Techniques by Richard Niemiec
Oracle white parer April 2017: The Oracle Optimizer Explain Explain Plan
?
Senior Oracle Database Administrator at TOSAN TECHNO
12 个月Excelent In addtion to your post clustering factor can have impact on range scans. And a p->s in the middle of your execution plan could be your bottleneck which you should consider it.