Understanding Oracle Histogram
Javed Akhtar
Infra Manager (Looking for next role upgrade, TAM, Program manager, Service Delivery manager, Project Manager)
What is histogram?
In Oracle, a histogram is a specialized column statistic that provides detailed information about data distribution within a column, aiding the Cost-Based Optimizer (CBO) in selecting the most efficient execution plan. When data in a column is unevenly distributed, or “skewed,” basic statistics alone may not give the optimizer enough information to make optimal choices. Histograms address this by dividing data into “buckets” that represent different frequency intervals, showing how often certain values or ranges appear.
For instance, if a column has a few values that appear frequently and many values that are rare, a histogram will capture this unevenness. This information helps the optimizer decide whether to use an index or a full-table scan and allows for more accurate cardinality estimates, particularly useful in complex queries and table joins.
Oracle supports several types of histograms, such as frequency histograms (for columns with fewer distinct values) and height-balanced histograms (for columns with many unique values). Each type is designed to handle different distribution patterns, ensuring that the optimizer makes data-driven decisions. By enabling better estimates and reducing query time, histograms are valuable tools for tuning database performance, especially in systems with complex data distributions.
What is Purpose of Histograms?
The purpose of histograms in Oracle is to allow the optimizer to accurately estimate row counts for columns with uneven data distribution, helping it choose the most efficient query execution plan. Typically, the optimizer assumes an even distribution across all values in a column, which can lead to inefficient plans when data is actually skewed.
Consider Amazon’s shipment data, where a large portion of orders are sent to densely populated regions like California, Texas, and New York, while fewer orders go to smaller states. Suppose the column representing shipping destinations has this skewed distribution. If a user queries shipments to a less common destination like Wyoming, without a histogram, the optimizer might assume an even distribution and estimate too many rows, leading to a less efficient full table scan.
With a histogram, however, the optimizer knows that only a small fraction of shipments go to Wyoming and adjusts its row estimate accordingly. This allows the optimizer to choose a faster index scan instead, minimizing query time and resource usage. Thus, histograms improve query performance by capturing the real data distribution, enabling the optimizer to make precise decisions for indexing, joining, and filtering, especially when handling highly skewed data, as seen in large-scale operations like Amazon’s shipments.
What are type of Histograms?
In Oracle, histograms are used to provide the optimizer with detailed information about the distribution of values in a column, aiding in more efficient query planning. The database automatically chooses the type of histogram based on the number of distinct values (NDV) and the data distribution. The main types of histograms in Oracle are:
Each type has specific use cases and is automatically selected by Oracle based on data characteristics, with frequency histograms being generally the most precise.
How Oracle Database Chooses the Histogram Type?
Oracle Database chooses the type of histogram based on the distribution characteristics of the column data and specific internal criteria, involving factors such as the number of distinct values (NDV), the number of histogram buckets (n), and a calculated percentage threshold (p). Here’s a breakdown of how Oracle determines the appropriate histogram type:
Decision Tree for Histogram Type
Oracle Database then applies the following logic to select the histogram type:
In summary, Oracle’s histogram selection process relies on data characteristics and calculated thresholds to ensure that the optimizer receives accurate data distribution information, leading to better query performance.
When Oracle Database Creates Histograms?
Oracle Database automatically creates histograms for columns as needed based on the query workload when DBMS_STATS gathers statistics for a table, and if queries have referenced the columns within that table.
Here’s the basic process:
SQL> CREATE TABLE SCOTT.ODBA_TAB AS SELECT * FROM DBA_OBJECTS;
Table created.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'ODBA_TAB',
> estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4);
PL/SQL procedure successfully completed.
SQL>
SQL> col table_name for a10
SQL> col column_name for a30
SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns
> where table_name = 'ODBA_TAB' AND OWNER='SCOTT';
TABLE_NAME COLUMN_NAME HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB OWNER NONE
ODBA_TAB OBJECT_NAME NONE
ODBA_TAB SUBOBJECT_NAME NONE
ODBA_TAB OBJECT_ID NONE
ODBA_TAB DATA_OBJECT_ID NONE
ODBA_TAB OBJECT_TYPE NONE
ODBA_TAB CREATED NONE
ODBA_TAB LAST_DDL_TIME NONE
ODBA_TAB TIMESTAMP NONE
ODBA_TAB STATUS NONE
ODBA_TAB TEMPORARY NONE
ODBA_TAB GENERATED NONE
ODBA_TAB SECONDARY NONE
ODBA_TAB NAMESPACE NONE
ODBA_TAB EDITION_NAME NONE
15 rows selected.
SQL>
SQL> SELECT COUNT(*) FROM SCOTT.ODBA_TAB WHERE TEMPORARY='Y';
COUNT(*)
----------
247
SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns
> where table_name = 'ODBA_TAB' AND OWNER='SCOTT';
TABLE_NAME COLUMN_NAME HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB OWNER NONE
ODBA_TAB OBJECT_NAME NONE
ODBA_TAB SUBOBJECT_NAME NONE
ODBA_TAB OBJECT_ID NONE
ODBA_TAB DATA_OBJECT_ID NONE
ODBA_TAB OBJECT_TYPE NONE
ODBA_TAB CREATED NONE
ODBA_TAB LAST_DDL_TIME NONE
ODBA_TAB TIMESTAMP NONE
ODBA_TAB STATUS NONE
ODBA_TAB TEMPORARY NONE
ODBA_TAB GENERATED NONE
ODBA_TAB SECONDARY NONE
ODBA_TAB NAMESPACE NONE
ODBA_TAB EDITION_NAME NONE
15 rows selected.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'ODBA_TAB',
> estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4);
PL/SQL procedure successfully completed.
SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns
> where table_name = 'ODBA_TAB' AND OWNER='SCOTT';
TABLE_NAME COLUMN_NAME HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB OWNER NONE
ODBA_TAB OBJECT_NAME NONE
ODBA_TAB SUBOBJECT_NAME NONE
ODBA_TAB OBJECT_ID NONE
ODBA_TAB DATA_OBJECT_ID NONE
ODBA_TAB OBJECT_TYPE NONE
ODBA_TAB CREATED NONE
ODBA_TAB LAST_DDL_TIME NONE
ODBA_TAB TIMESTAMP NONE
ODBA_TAB STATUS NONE
ODBA_TAB TEMPORARY FREQUENCY <----
When to Use Histograms
Histograms should be created on columns that are frequently used in the WHERE clauses of queries and have a highly skewed data distribution. It is generally recommended to let Oracle determine the need for histograms automatically by using the METHOD_OPT AUTO setting. This allows Oracle to decide which columns should have histograms based on their data distribution and query workload.
When Not to Use Histograms
Histograms, like all optimizer statistics, are static. If the data distribution of a column changes frequently, it is important to recompute the histogram for that column. Histograms are not useful for columns with the following characteristics:
Useful Queries and View to check Histograms
select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level.
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME';
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICSWHERE TABLE_NAME='&TABLE_NAME';
USER_HISTOGRAMS
USER_PART_HISTOGRAMS
USER_SUBPART_HISTOGRAMS
USER_TAB_HISTOGRAMS
Delete Histogram
SQL> exec dbms_stats.gather_table_stats('SCOTT','ODBA_TAB',
> METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE 1');
PL/SQL procedure successfully completed.
- Delete histograms on all columns
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR ALL COLUMNS SIZE 1');
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
--Create Histogram on Column USERNAME
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
exec dbms_stats.gather_table_stats('SCOTT','ODBA_TEST', METHOD_OPT => 'FOR COLUMNS USERNAME SIZE AUTO');
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
--Gather STATS with REPEAT Option
Please note REPEAT option will re-create existing histograms without creating new histograms.
- For TEMPORARY Column
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE REPEAT');
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
- For All columns
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
exec dbms_stats.gather_table_stats('HR','EMPLOYEES',METHOD_OPT=> 'FOR ALL COLUMNS SIZE REPEAT');
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
-- How to change default METHOD_OPT value from AUTO to REPEAT at Database Level
-- Verify
select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level.
-- set to REPEAT
-- This procedure is used to set the global statistics preferences.
exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');
-- Verify
select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level.
--Since we set to REPEAT at Database level, hence going forwared gathering of statistics, manually or automatically will not create any new histograms. Collect histograms only on the columns that already have histograms.
Extended Statistics
In real-world data, there is often a correlation or relationship between the data stored in different columns of the same table. By creating extended statistics on a group of columns, the Oracle Optimizer can generate more accurate cardinality estimates when these columns are used together in a WHERE clause of a SQL query.
You can use DBMS_STATS.CREATE_EXTENDED_STATS to define a column group for which statistics should be collected collectively. Once the column group is established, Oracle will automatically maintain the statistics for that group whenever statistics are gathered for the table.
Create Extended Statistics
SELECT DBMS_STATS.CREATE_EXTENDED_STATS('HR', 'ODBA_TAB', '(OBJECT_TYPE,TEMPORARY)') FROM DUAL;
exec dbms_stats.gather_table_stats('HR','ODBA_TAB');
-- Display extended statistics extensions (Will show the relationship and extension name)
column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';
-- Display extended statistics distinct values and histograms
column col_group format a30
select e.extension col_group,
t.num_distinct,
t.histogram
from dba_stat_extensions e
join dba_tab_col_statistics t on e.extension_name=t.column_name
and t.table_name = 'ODBA_TAB';
Dropping Extended Statistics
exec dbms_stats.drop_extended_stats(ownname => 'HR',tabname => 'ODBA_TAB',extension => '(OBJECT_TYPE,TEMPORARY)');
Eg:
SQL> CREATE TABLE HR.ODBA_TAB AS SELECT * FROM DBA_OBJECTS;
Table created.
SQL> SELECT COUNT(*) FROM HR.ODBA_TAB WHERE TEMPORARY='Y' and OBJECT_TYPE='TABLE';
COUNT(*)
----------
115
SQL>
SQL> column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';SQL>
EXTENSION_NAME EXTENSION
------------------------------ ------------------------------
SYS_STUYM6W8ZX7#X$MP7$EJ_PVU9O ("OBJECT_TYPE","TEMPORARY")
SQL>
SQL> column col_group format a30
select e.extension col_group,
t.num_distinct,
t.histogram
from dba_stat_extensions e
join dba_tab_col_statistics t on e.extension_name=t.column_name
and t.table_name = 'ODBA_TAB';SQL> 2 3 4 5 6
COL_GROUP NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("OBJECT_TYPE","TEMPORARY") 47 FREQUENCY
SQL>
-- Dropping Extended Statistics
SQL> exec dbms_stats.drop_extended_stats(ownname => 'HR',tabname => 'ODBA_TAB',extension => '(OBJECT_TYPE,TEMPORARY)');
PL/SQL procedure successfully completed.
SQL>
-- Verify
SQL> column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';SQL>
no rows selected <----
SQL>