Hybrid Columnar Compression In Autonomous Transaction Processing
Overview
In a nutshell, Hybrid Columnar Compression or HCC is a compression feature supplied by some Oracle smart storage systems and, most notably, by the Exadata Storage Servers. Since Exadata Smart Storage is the underlining storage infrastructure used by Oracle Autonomous Database (ADB), HCC is also, available on all ADB deployments. Both the compression algorithm and how is automatically used in Autonomous Data Warehouse (ADW) are nicely described here, instead, in this article I will focus on a specific use case on ATP and I will show some useful PL/SQL snippets.
Why This Article
HCC is most useful in Data Warehouse environments and for analytical and batch use cases. As a matter of fact, HCC is automatically enabled and used in ADW. That said, while HCC is not suitable or convenient to compress hot data (subject to DML operations), it can be very useful in managing cold historical data.
Many companies have strict requirements about the retention of certain data, for example an insurance company might need to keep the insurance policy documents for ten years after their expiration. Typically those data are never updated but need to be accessible for read-only activities, for example, custom queries or periodic reports. Furthermore, those data tend to accumulate over time consuming a lot of precious storage in the company OLTP databases. A tradition approach to this problem is to move the cold data on less expensive storage, for example, object storage or HDFS on Big Data platforms. That approach has some advantages but has downsides as well. it introduces complexity and scatters the data in multiple locations with possible security or performance implications.
In this article I will show:
- how to assess the storage savings when using HCC
- how to enable HCC and verify the compression ratio
Once I made sure that our cold table is a good candidate for HCC, I will demonstrate that using HCC in ATP we can keep all that data, together with the hot data, saving a lot of storage and, often, improving the performance of our queries too.
The code snippets provided here will show how to use the Oracle supplied PL/SQL packages and SQL statement, to assess, enable and test the HCC feature. To keep things eays, I will consider a simple scenario involving a single table, in a more realistic scenario, we often need to operate at partition level where we have cold partitions where data is compressed and never updated and hot ones where data are not compressed.
HCC Compression Assessment
As said, the example scenario is quite simple: I have a table which stores historical data and that I want to compress. However, before doing that, I want to see if there is a real advantage in terms of storage saving. The test table is the following:
desc my_test_table
Name Null? Type
_________ ________ _______________
YEAR NUMBER
AGE NUMBER
ETHNIC NUMBER
SEX VARCHAR2(10)
AREA NUMBER
COUNT NUMBER DEFAULT COLLATION "USING_NLS_COMP" ;
select count(*) from my_test_table;
COUNT(*)
___________
15710331
First, I want to know what is the current real storage usage and, to do that, I will use the DBMS_SPACE package.
DECLARE
unf NUMBER;
unfb NUMBER;
fs1 NUMBER;
fs1b NUMBER;
fs2 NUMBER;
fs2b NUMBER;
fs3 NUMBER;
fs3b NUMBER;
fs4 NUMBER;
fs4b NUMBER;
full NUMBER;
fullb NUMBER;
BEGIN
dbms_space.space_usage('ADMIN',
'MY_TEST_TABLE',
'TABLE',
unf, unfb,
fs1, fs1b,
fs2, fs2b,
fs3, fs3b,
fs4, fs4b,
full, fullb);
DBMS_OUTPUT.put_line( 'Number of unformatted blocks : ' || unf);
DBMS_OUTPUT.put_line( 'Number of MB in unformatted blocks : ' || unfb/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 0% and 25% : ' || fs1);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 0% and 25% : ' || fs1b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 25% and 50% : ' || fs2);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 25% and 50% : ' || fs2b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 50% and 75% : ' || fs3);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 50% and 75% : ' || fs3b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 75% and 100% : ' || fs4);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 75% and 100% : ' || fs4b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of full blocks : ' || full);
DBMS_OUTPUT.put_line( 'Number of MB in full blocks : ' || fullb/1024/1024);
END;
/
In this case I get the following output:
Number of unformatted blocks : 0
Number of MB in unformatted blocks : 0
Number of blocks with free space between 0% and 25% : 0
Number of MB in blocks with free space between 0% and 25% : 0
Number of blocks with free space between 25% and 50% : 0
Number of MB in blocks with free space between 25% and 50% : 0
Number of blocks with free space between 50% and 75% : 0
Number of MB in blocks with free space between 50% and 75% : 0
Number of blocks with free space between 75% and 100% : 0
Number of MB in blocks with free space between 75% and 100% : 0
Number of full blocks : 67006
Number of MB in full blocks : 523.484375
As we can se the table use 523 MBytes or 67006 full DB blocks (each block has a size of 8 KBytes).
Now I want to estimate the compression ratio achievable using HCC compression and, more specifically, the HCC Query High compression type. The HCC Query High compression type is the most effective when we want both a good compression rate and good query performance. To do the evaluation I use the GET_COMPRESSION_RATIO procedure of the DBMS_COMPRESSION package. This procedure executes a HCC dry run compressing a subset of rows and computing the resulting compression ratio.
领英推è
DECLARE
my_blkcnt_cmp PLS_INTEGER;
my_blkcnt_uncmp PLS_INTEGER;
my_row_cmp PLS_INTEGER;
my_row_uncmp PLS_INTEGER;
my_cmp_ratio NUMBER;
my_comptype_str VARCHAR2(32767);
BEGIN
DBMS_COMPRESSION.GET_COMPRESSION_RATIO (
scratchtbsname => 'DATA' ,
ownname => 'ADMIN' ,
objname => 'MY_TEST_TABLE' ,
subobjname => NULL ,
comptype => DBMS_COMPRESSION.COMP_QUERY_HIGH,
blkcnt_cmp => my_blkcnt_cmp,
blkcnt_uncmp => my_blkcnt_uncmp,
row_cmp => my_row_cmp,
row_uncmp => my_row_uncmp,
cmp_ratio => my_cmp_ratio,
comptype_str => my_comptype_str,
subset_numrows => DBMS_COMPRESSION.comp_ratio_minrows,
objtype => DBMS_COMPRESSION.objtype_table
);
DBMS_OUTPUT.put_line( 'Number of blocks used by the compressed sample of the object : ' || my_blkcnt_cmp);
DBMS_OUTPUT.put_line( 'Number of blocks used by the uncompressed sample of the object : ' || my_blkcnt_uncmp);
DBMS_OUTPUT.put_line( 'Number of rows in a block in compressed sample of the object : ' || my_row_cmp);
DBMS_OUTPUT.put_line( 'Number of rows in a block in uncompressed sample of the object : ' || my_row_uncmp);
DBMS_OUTPUT.put_line( 'Estimated Compression Ratio of Sample : ' || my_cmp_ratio);
DBMS_OUTPUT.put_line( 'Compression Type : ' || my_comptype_str);
END;
/
In this case I obtain:
Compression Advisor self-check validation successful. select count(*) on both Uncompressed and EHCC Compressed format = 1000001 rows
Number of blocks used by the compressed sample of the object : 110
Number of blocks used by the uncompressed sample of the object : 4182
Number of rows in a block in compressed sample of the object : 9091
Number of rows in a block in uncompressed sample of the object : 239
Estimated Compression Ratio of Sample : 38
Compression Type : "Compress Query High"
so the estimated compression ratio is 38x which is a very good result for this simple case.
HCC Compression Validation
To verify if the previous estimation was accurate I will enable the HCC compression in a table cloned from the original one.
CREATE TABLE my_comp_test_table AS SELECT * FROM my_test_table;
Table MY_COMP_TEST_TABLE created.
Elapsed: 00:00:11.198
ALTER TABLE my_comp_test_table MOVE COLUMN STORE COMPRESS FOR QUERY HIGH;
Table MY_COMP_TEST_TABLE altered.
Elapsed: 00:00:16.646
The next step is about using DBMS_SPACE again to compute the space used by the compressed table:
DECLARE
unf NUMBER;
unfb NUMBER;
fs1 NUMBER;
fs1b NUMBER;
fs2 NUMBER;
fs2b NUMBER;
fs3 NUMBER;
fs3b NUMBER;
fs4 NUMBER;
fs4b NUMBER;
full NUMBER;
fullb NUMBER;
BEGIN
dbms_space.space_usage('ADMIN',
'MY_COMP_TEST_TABLE',
'TABLE',
unf, unfb,
fs1, fs1b,
fs2, fs2b,
fs3, fs3b,
fs4, fs4b,
full, fullb);
DBMS_OUTPUT.put_line( 'Number of unformatted blocks : ' || unf);
DBMS_OUTPUT.put_line( 'Number of MB in unformatted blocks : ' || unfb/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 0% and 25% : ' || fs1);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 0% and 25% : ' || fs1b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 25% and 50% : ' || fs2);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 25% and 50% : ' || fs2b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 50% and 75% : ' || fs3);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 50% and 75% : ' || fs3b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of blocks with free space between 75% and 100% : ' || fs4);
DBMS_OUTPUT.put_line( 'Number of MB in blocks with free space between 75% and 100% : ' || fs4b/1024/1024);
DBMS_OUTPUT.put_line( 'Number of full blocks : ' || full);
DBMS_OUTPUT.put_line( 'Number of MB in full blocks : ' || fullb/1024/1024);
END;
/
and I get:
Number of unformatted blocks : 0
Number of MB in unformatted blocks : 0
Number of blocks with free space between 0% and 25% : 0
Number of MB in blocks with free space between 0% and 25% : 0
Number of blocks with free space between 25% and 50% : 0
Number of MB in blocks with free space between 25% and 50% : 0
Number of blocks with free space between 50% and 75% : 0
Number of MB in blocks with free space between 50% and 75% : 0
Number of blocks with free space between 75% and 100% : 0
Number of MB in blocks with free space between 75% and 100% : 0
Number of full blocks : 1796
Number of bytes in full blocks : 14.03125
Given the previous output, the actual compression ratio is 37.3, in fact:
523.48 MB / 14.03 MB = 37.3
Finally I can check, a a simple query, to see if there is any change in performance. On the non-compressed table we get:
select count(*),sex from my_test_table group by sex;
COUNT(*) SEX
___________ _____________
7183855 NOT DECLARED
4253247 MALE
4273229 FEMALE
Elapsed: 00:00:00.619
on the compressed one:
select count(*),sex from my_comp_test_table group by sex;
COUNT(*) SEX
___________ _____________
7183855 NOT DECLARED
4253247 MALE
4273229 FEMALE
Elapsed: 00:00:00.071
So, this specific query is fast on both tables but on the compressed one is almost 10 times faster. In this case, this performance improvement is due to the fact that we are not using any index and the query is executed with a FULL TABLE SCAN so when scanning the compressed table we need to retrieve a lot less database blocks.
Takeaways
In this short article I briefly explained how HCC can be beneficial also for some ATP workloads. The use case is about historical data and how we can keep them together with the live ones. I showed, also, a simple method to estimate the final compression ratio given an existing table. The important takeaways are:
- HCC is available also in ATP but must be enabled and managed manually.
- Even in OLTP workloads, there are use cases where HCC adoption might be extremely beneficial, however its adoption should be preceded by a careful evaluation.
- HCC compression is mostly suitable for cold tables and partitions. Furthermore, to have optimal performance and all the expected benefits, a dataset, once compressed, should be only queried and no DML operations should be executed on it.
For more information about Hybrid Columnar Compression I suggest the following technical brief and the official documentation.