Understanding Oracle Histogram

What is histogram?

In Oracle, a histogram is a specialized column statistic that provides detailed information about data distribution within a column, aiding the Cost-Based Optimizer (CBO) in selecting the most efficient execution plan. When data in a column is unevenly distributed, or “skewed,” basic statistics alone may not give the optimizer enough information to make optimal choices. Histograms address this by dividing data into “buckets” that represent different frequency intervals, showing how often certain values or ranges appear.

For instance, if a column has a few values that appear frequently and many values that are rare, a histogram will capture this unevenness. This information helps the optimizer decide whether to use an index or a full-table scan and allows for more accurate cardinality estimates, particularly useful in complex queries and table joins.

Oracle supports several types of histograms, such as frequency histograms (for columns with fewer distinct values) and height-balanced histograms (for columns with many unique values). Each type is designed to handle different distribution patterns, ensuring that the optimizer makes data-driven decisions. By enabling better estimates and reducing query time, histograms are valuable tools for tuning database performance, especially in systems with complex data distributions.

What is Purpose of Histograms?

The purpose of histograms in Oracle is to allow the optimizer to accurately estimate row counts for columns with uneven data distribution, helping it choose the most efficient query execution plan. Typically, the optimizer assumes an even distribution across all values in a column, which can lead to inefficient plans when data is actually skewed.

Consider Amazon’s shipment data, where a large portion of orders are sent to densely populated regions like California, Texas, and New York, while fewer orders go to smaller states. Suppose the column representing shipping destinations has this skewed distribution. If a user queries shipments to a less common destination like Wyoming, without a histogram, the optimizer might assume an even distribution and estimate too many rows, leading to a less efficient full table scan.

With a histogram, however, the optimizer knows that only a small fraction of shipments go to Wyoming and adjusts its row estimate accordingly. This allows the optimizer to choose a faster index scan instead, minimizing query time and resource usage. Thus, histograms improve query performance by capturing the real data distribution, enabling the optimizer to make precise decisions for indexing, joining, and filtering, especially when handling highly skewed data, as seen in large-scale operations like Amazon’s shipments.

What are type of Histograms?

In Oracle, histograms are used to provide the optimizer with detailed information about the distribution of values in a column, aiding in more efficient query planning. The database automatically chooses the type of histogram based on the number of distinct values (NDV) and the data distribution. The main types of histograms in Oracle are:

  1. Height-Balanced Histogram: In a height-balanced histogram, each “bucket” contains the same number of rows. This type is often used when a column has many distinct values and a skewed distribution, as it adjusts bucket size to accommodate high-frequency values.
  2. Frequency Histogram: In a frequency histogram, each distinct value in the column has its own bucket, making it highly accurate. This type of histogram is ideal for columns with relatively few distinct values. Frequency histograms give precise information about each value, making them more accurate than height-balanced histograms.
  3. Top-Frequency Histogram: This type captures the distribution of the most frequently occurring values in the column, while treating less common values together. It’s useful when certain values are far more frequent than others, providing a balance between detail and storage efficiency.
  4. Hybrid Histogram: The hybrid histogram combines characteristics of height-balanced and top-frequency histograms. It provides detailed information for highly skewed values and is generally more accurate, especially for columns with many distinct values but with a small subset of highly frequent values.
  5. None: If the histogram type is “None,” no histogram is present for the column. This may occur if the column has evenly distributed data or when the database doesn’t deem a histogram necessary.

Each type has specific use cases and is automatically selected by Oracle based on data characteristics, with frequency histograms being generally the most precise.

How Oracle Database Chooses the Histogram Type?

Oracle Database chooses the type of histogram based on the distribution characteristics of the column data and specific internal criteria, involving factors such as the number of distinct values (NDV), the number of histogram buckets (n), and a calculated percentage threshold (p). Here’s a breakdown of how Oracle determines the appropriate histogram type:

  1. NDV (Number of Distinct Values): This is the total count of unique values in the column. For instance, if a column contains values like 100, 200, and 300, then its NDV is 3.
  2. Number of Buckets (n): This defines the maximum number of “buckets” (or intervals) into which column values can be divided for histogram purposes. Oracle sets the default bucket count to 254, although this can be adjusted based on the complexity of the data.
  3. Threshold Percentage (p): Oracle calculates an internal percentage threshold as p = (1 - (1 / n)) * 100. For the default n value of 254, this threshold (p) is 99.6%. This percentage helps Oracle decide if a more detailed histogram is warranted.
  4. Sampling Percentage (estimate_percent): The estimate_percent parameter, set in the DBMS_STATS package, also influences the histogram decision. When estimate_percent is set to AUTO_SAMPLE_SIZE (the default), Oracle dynamically determines an optimal sample size, improving histogram accuracy.

Decision Tree for Histogram Type

Oracle Database then applies the following logic to select the histogram type:

  • Frequency Histogram: Chosen if NDV is relatively low, allowing each distinct value to fit into its own bucket. This type is ideal for columns with few unique values, enabling precise cardinality estimates.
  • Top-Frequency Histogram: Selected if a column has many distinct values, but only a few values appear very frequently. This histogram focuses on the most common values, grouping infrequent values together for efficiency.
  • Height-Balanced Histogram: Used when the column has many distinct values and the distribution is moderately skewed. Oracle assigns equal row counts to each bucket, which captures value frequency patterns even when there are no very high-frequency values.
  • Hybrid Histogram: If the column has numerous distinct values with a significant skew, Oracle uses the hybrid histogram. It combines aspects of frequency and height-balanced histograms, focusing on highly skewed values for better query estimates.

In summary, Oracle’s histogram selection process relies on data characteristics and calculated thresholds to ensure that the optimizer receives accurate data distribution information, leading to better query performance.

When Oracle Database Creates Histograms?

Oracle Database automatically creates histograms for columns as needed based on the query workload when DBMS_STATS gathers statistics for a table, and if queries have referenced the columns within that table.

Here’s the basic process:

  1. DBMS_STATS is run for a table with the METHOD_OPT parameter set to the default value of SIZE AUTO.
  2. A user executes a query on the table.
  3. The database records the predicates from the query and updates the SYS.COL_USAGE$ data dictionary table.
  4. When DBMS_STATS is run again, it queries SYS.COL_USAGE$ to identify which columns require histograms, based on the previous query workload.

SQL> CREATE TABLE SCOTT.ODBA_TAB AS SELECT * FROM DBA_OBJECTS;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'ODBA_TAB',
   > estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4);

PL/SQL procedure successfully completed.

SQL>

SQL> col table_name for a10
SQL> col column_name for a30
SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns 
   > where table_name = 'ODBA_TAB' AND OWNER='SCOTT';

TABLE_NAME COLUMN_NAME                    HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB        OWNER                          NONE
ODBA_TAB        OBJECT_NAME                    NONE
ODBA_TAB        SUBOBJECT_NAME                 NONE
ODBA_TAB        OBJECT_ID                      NONE
ODBA_TAB        DATA_OBJECT_ID                 NONE
ODBA_TAB        OBJECT_TYPE                    NONE
ODBA_TAB        CREATED                        NONE
ODBA_TAB        LAST_DDL_TIME                  NONE
ODBA_TAB        TIMESTAMP                      NONE
ODBA_TAB        STATUS                         NONE
ODBA_TAB        TEMPORARY                      NONE
ODBA_TAB        GENERATED                      NONE
ODBA_TAB        SECONDARY                      NONE
ODBA_TAB        NAMESPACE                      NONE
ODBA_TAB        EDITION_NAME                   NONE

15 rows selected.

SQL>        
SQL> SELECT COUNT(*) FROM SCOTT.ODBA_TAB WHERE TEMPORARY='Y';

  COUNT(*)
----------
       247

SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns 
   > where table_name = 'ODBA_TAB' AND OWNER='SCOTT';

TABLE_NAME COLUMN_NAME                    HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB        OWNER                          NONE
ODBA_TAB        OBJECT_NAME                    NONE
ODBA_TAB        SUBOBJECT_NAME                 NONE
ODBA_TAB        OBJECT_ID                      NONE
ODBA_TAB        DATA_OBJECT_ID                 NONE
ODBA_TAB        OBJECT_TYPE                    NONE
ODBA_TAB        CREATED                        NONE
ODBA_TAB        LAST_DDL_TIME                  NONE
ODBA_TAB        TIMESTAMP                      NONE
ODBA_TAB        STATUS                         NONE
ODBA_TAB        TEMPORARY                      NONE
ODBA_TAB        GENERATED                      NONE
ODBA_TAB        SECONDARY                      NONE
ODBA_TAB        NAMESPACE                      NONE
ODBA_TAB        EDITION_NAME                   NONE

15 rows selected.

SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'ODBA_TAB',
   > estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4);

PL/SQL procedure successfully completed.

SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns 
   > where table_name = 'ODBA_TAB' AND OWNER='SCOTT';

TABLE_NAME COLUMN_NAME                    HISTOGRAM
---------- ------------------------------ ---------------
ODBA_TAB        OWNER                          NONE
ODBA_TAB        OBJECT_NAME                    NONE
ODBA_TAB        SUBOBJECT_NAME                 NONE
ODBA_TAB        OBJECT_ID                      NONE
ODBA_TAB        DATA_OBJECT_ID                 NONE
ODBA_TAB        OBJECT_TYPE                    NONE
ODBA_TAB        CREATED                        NONE
ODBA_TAB        LAST_DDL_TIME                  NONE
ODBA_TAB        TIMESTAMP                      NONE
ODBA_TAB        STATUS                         NONE
ODBA_TAB        TEMPORARY                      FREQUENCY <----        

When to Use Histograms

Histograms should be created on columns that are frequently used in the WHERE clauses of queries and have a highly skewed data distribution. It is generally recommended to let Oracle determine the need for histograms automatically by using the METHOD_OPT AUTO setting. This allows Oracle to decide which columns should have histograms based on their data distribution and query workload.

When Not to Use Histograms

Histograms, like all optimizer statistics, are static. If the data distribution of a column changes frequently, it is important to recompute the histogram for that column. Histograms are not useful for columns with the following characteristics:

  • The column data is uniformly distributed.
  • The column is not used in WHERE clauses of queries.
  • The column is unique and used only with equality predicates.
  • Creating histograms on every column of every table is not recommended, as it increases the time required to analyze the table, boosts parse time, and can lead to poor execution plans generated by the optimizer

Useful Queries and View to check Histograms

select dbms_stats.get_prefs ('METHOD_OPT') from dual;  -- At Database level.

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME';

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICSWHERE TABLE_NAME='&TABLE_NAME';

USER_HISTOGRAMS
USER_PART_HISTOGRAMS
USER_SUBPART_HISTOGRAMS
USER_TAB_HISTOGRAMS        

Delete Histogram

SQL> exec dbms_stats.gather_table_stats('SCOTT','ODBA_TAB', 
   > METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE 1');

PL/SQL procedure successfully completed.        
- Delete histograms on all columns

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR ALL COLUMNS SIZE 1');

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';        
--Create Histogram on Column USERNAME

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

exec dbms_stats.gather_table_stats('SCOTT','ODBA_TEST', METHOD_OPT => 'FOR COLUMNS USERNAME SIZE AUTO');

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';         
--Gather STATS with REPEAT Option

Please note REPEAT option will re-create existing histograms without creating new histograms.

- For TEMPORARY Column

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE REPEAT');

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

- For All columns 

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';

exec dbms_stats.gather_table_stats('HR','EMPLOYEES',METHOD_OPT=> 'FOR ALL COLUMNS SIZE REPEAT');

SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';        
-- How to change default METHOD_OPT value from AUTO to REPEAT at Database Level

--  Verify 
select dbms_stats.get_prefs ('METHOD_OPT') from dual;  -- At Database level.

-- set to REPEAT 
-- This procedure is used to set the global statistics preferences.
exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');

-- Verify 
select dbms_stats.get_prefs ('METHOD_OPT') from dual;  -- At Database level.
--Since we set to REPEAT at Database level, hence going forwared gathering of statistics, manually or automatically will not create any new histograms. Collect histograms only on the columns that already have histograms.        

Extended Statistics

In real-world data, there is often a correlation or relationship between the data stored in different columns of the same table. By creating extended statistics on a group of columns, the Oracle Optimizer can generate more accurate cardinality estimates when these columns are used together in a WHERE clause of a SQL query.

You can use DBMS_STATS.CREATE_EXTENDED_STATS to define a column group for which statistics should be collected collectively. Once the column group is established, Oracle will automatically maintain the statistics for that group whenever statistics are gathered for the table.

Create Extended Statistics 

SELECT DBMS_STATS.CREATE_EXTENDED_STATS('HR', 'ODBA_TAB', '(OBJECT_TYPE,TEMPORARY)') FROM DUAL;
exec dbms_stats.gather_table_stats('HR','ODBA_TAB');

-- Display extended statistics extensions (Will show the relationship and extension name)

column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';

-- Display extended statistics distinct values and histograms

column col_group format a30
 select e.extension col_group,
       t.num_distinct,
       t.histogram
from   dba_stat_extensions e
       join dba_tab_col_statistics t on e.extension_name=t.column_name
and    t.table_name = 'ODBA_TAB';


Dropping Extended Statistics

exec dbms_stats.drop_extended_stats(ownname => 'HR',tabname => 'ODBA_TAB',extension => '(OBJECT_TYPE,TEMPORARY)');


Eg:

SQL> CREATE TABLE HR.ODBA_TAB AS SELECT * FROM DBA_OBJECTS;

Table created.

SQL> SELECT COUNT(*) FROM HR.ODBA_TAB WHERE TEMPORARY='Y' and OBJECT_TYPE='TABLE';

  COUNT(*)
----------
       115

SQL>
SQL> column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';SQL>

EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STUYM6W8ZX7#X$MP7$EJ_PVU9O ("OBJECT_TYPE","TEMPORARY")

SQL>
SQL> column col_group format a30
 select e.extension col_group,
       t.num_distinct,
       t.histogram
from   dba_stat_extensions e
       join dba_tab_col_statistics t on e.extension_name=t.column_name
and    t.table_name = 'ODBA_TAB';SQL>   2    3    4    5    6

COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("OBJECT_TYPE","TEMPORARY")              47 FREQUENCY

SQL>


-- Dropping Extended Statistics

SQL> exec dbms_stats.drop_extended_stats(ownname => 'HR',tabname => 'ODBA_TAB',extension => '(OBJECT_TYPE,TEMPORARY)');

PL/SQL procedure successfully completed.

SQL> 

-- Verify 

SQL> column extension format a30
select extension_name, extension from dba_stat_extensions where table_name='ODBA_TAB';SQL>

no rows selected <----

SQL>        

要查看或添加评论,请登录

Javed Akhtar的更多文章

  • An Under-Sized PGA Detected by AWR Report

    An Under-Sized PGA Detected by AWR Report

    Setting the correct size for the Program Global Area (PGA) is crucial for optimizing Oracle database performance, this…

    5 条评论
  • The database instance behavior and the ab_.dat file

    The database instance behavior and the ab_.dat file

    Some people still think that the ASM is which performs all IO activity of our database, I have seen several DBAs…

  • Solving Communication problems between DB and ASM instances

    Solving Communication problems between DB and ASM instances

    Introduction Most of the time I write how-to articles or I am introducing a new feature of Oracle Database. Those…

  • Storage index Exadata

    Storage index Exadata

    What is Storage Index in Exadata System? Storage Index in Exadata is another important feature which helps to reduce…

    2 条评论
  • OCI Securities

    OCI Securities

    Migrating workloads to the cloud is one of the biggest challenges our customers face today. One major decision-making…

  • OCI Serverless functions

    OCI Serverless functions

    Oracle Cloud Infrastructure (OCI) Functions is a serverless platform that allows developers to create, run, and scale…

  • OCI Load Balancer

    OCI Load Balancer

    Oracle Cloud Infrastructure (OCI) Load Balancer is a service that provides automated traffic distribution from one…

    1 条评论
  • Why standby redo logs are required?

    Why standby redo logs are required?

    Introduction I’m always surprised to find the number of Oracle standby databases in use today that do not have Standby…

  • Dead Lock handling

    Dead Lock handling

    Whenever you have competing DML running against the same data, you run the risk of a deadlock. This deadlock condition…

  • Parent and Child Cursors

    Parent and Child Cursors

    A “cursor” is a memory area in the library cache that is allocated to the SQL statement which users execute. This…