Snowflake Materialized View Query Auto-Rewrite
In this blog, we discuss the backbone of Snowflake Materialized View(MV) as MV Query Auto-rewrite including the architecture of how MV Rewrite works, rewrite rules involved, and conclude with some tips and limitations.
Problem Statement
A materialized view is a pre-computed data set derived from a query specification (the SELECT in the view definition) and stored for later use.
Without materialized view query auto rewrite implemented, we will need to query against the materialized view explicitly. For example, a query against table1 below will not use v1.
create materialized view v1 as
select * from table1 where column_1 between 100 and 400;
select * from table1 where column_1 between 200 and 300;
The above query needs to be rewritten by the user as below to use the defined materialized view.?
select * from v1 where column_1 between 200 and 300;
This is not scalable and error-prone as users need to tackle and analyze each query one by one and rewrite it if it is thought to be performing better with MVs. In many cases, it is not even practically possible if the queries are tool-generated.?
It’s important for the Snowflake Compiler/optimizer to rewrite the query to adopt the materialized view automatically under the hood.?
How does MV Query Automatical Rewrite Work?
In the query optimization phase during query compilation, we plug in materialized views query auto rewrite where we calculate and compare the cost of MV access path and base table access path, then choose the cheaper one as the access path selection.?
For a given query plan, we first find a query fragment that could be efficiently served by the existing materialized views, then we rewrite the query plan with the access plan fragment of the eligible and best choice of materialized views. As the simple example query plan illustrates below, the source query plan with table-based is rewritten into a query plan with a Materialized view-based access plan.?
Materialized view Query rewrite optimization goes through 3 different major phases: the collecting phase, the enumerating phase, and the Ranking phase. Let’s dive into the 3 phases to illustrate how materialized view Rewrite works.?
In the collecting phase, we use the MV Rewrite group#1(the left branch of the bottom table scan) as an example, see the collecting phase flow chart below.
In the collecting phase, we focus on the following tasks:
In the enumerating phase, we count which mv provides the logically equivalent query plan fragment to the source plan fragment.?
Let’s take a look at the enumerating flow chart above. We go through the following steps:
In the ranking phase, we go through each alternative compensated MV-accessing plan fragment and rank them based on their cost.
From the previous enumerating phase, there are 3 alternative plan fragments left, one with the original table scan, and 2 with MVs. Let’s see how we rank them.
MV Auto-Rewrite Rules
What are the rules to decide whether the MV rewrite plan is logically equivalent to the source plan??
Snowflake Optimizer is based on 2 methods to choose an access path:
We have described the basic steps to choose the access path using MV Auto-Rewrite in the above session of MV Auto-rewrite architecture. The following rules may be applied to MV Access path decision.
Rules for Predicate Subsumption
This rule is applied to decide whether an MV can be used based on the subsuming nature of the query filter, which means the MV’s filter in its definition is a superset of the query filter condition. If not, the MV can not be used.?
We are going to take a look at 4 different cases.
Case 1: Conjunction Filter (AND)
In this case #1, MV1 is defined on the filter (a = 1), and the query filter is the conjunction of (a = 1) and (b = 1). So MV’s filter is a superset of the query filter. As a result, the MV1 can be used for this query.?
Case 2: Disjunction Filter (OR)
In this case #2, MV2 is defined on the disjunction filters (a = 1 or a = 2), and the query filter is (a = 1). So MV’s filter is a superset of the query filter. As a result, the MV2 can be used for this query.?
领英推荐
Case 3: Range Filter?
In this case #3, MV3 is defined on the range filters (a > 0), and the query filter is (a in (2,3)). So MV’s filter is a superset of the query filter. As a result, the MV3 can be used for this query.?
Case 4: Combination of AND, OR, and Range filters?
In this case #4, MV4 is defined on the range filters ( (a=1 or a=2) and (b is not null or c is not null) ), and the query filter is (a = 1 and b > 0). So MV’s filter is a superset of the query filter. As a result, the MV3 can be used for this query.?
Rules for Aggregate Subsumption
This rule is applied to decide whether an MV can be used based on the subsuming nature of the query aggregation projection, which means the MV’s aggregation in its definition is a superset of the query aggregation. If not, the MV can not be used.?
We are going to take a look at 4 different cases.
Case 1: Exact Aggregation Subset
In this case #1, MV1 is defined on the aggregation projection (sum(c1), sum(c2) group by c3), and the query aggregation projection is (sum(c1) group by c3). So MV’s aggregation projection is a superset of the query aggregation on the exact grouping key. As a result, the MV1 can be used for this query.?
Case 2: Extra Keys in MV?
In this case #2, MV2 is defined on the aggregation projection (count(c1), sum(c2) group by c3,c4), and the query aggregation projection is (count(c1) group by c3). So MV’s aggregation projection is a superset of the query aggregation with an extra 2nd grouping key c4. As a result, the MV2 can be used for this query.?
Case 3: Extra Keys in MV + Filter subsumption for grouping keys
In this case #3, MV3 is defined on the aggregation projection (sum(c1), sum(c2) group by c3,c4), and range filter on grouping key(c3 > 3), and the query has the aggregation projection (sum(c1) group by c3) and filter (c3 > 4 and c4 > 5). So both MV’s aggregation projection and its filter are a superset of the query aggregation and query filter. As a result, the MV3 can be used for this query.?
Access Path: Choose this MV3 along with filter (c3 > 4 and c4 > 5) and the aggregation (sum(sum(c1)) group by c3 ).
Case 4: Extra Keys in MV + Filter subsumption on aggregation keys
In this case #4, MV4 is defined on the aggregation projection (sum(c1), sum(c2) group by c3,c4), and range filter on aggregation key(c1 > 3), and the query has the aggregation projection (sum(c1) group by c3) and filter (c1 > 4). Even though both MV’s aggregation projection and its filter are a superset of the query aggregation and query filter, its filter is on aggregation key c1, which can produce different results.? As a result, the MV4 is not legible to be used for this query.?
Access Path: No match, Choose the base table to avoid the wrong result.?
There is also a compilation time rule that any MV query rewrite that may take longer than 50% of the base table access path ( non-MV path) will not be considered.?
In summary, the MV Auto-Rewrite rules are as follows:
MV Auto-Rewrite Tips & Limitations
Below is a short list of tips and limitations regarding MV Query auto-rewrite that are often overlooked.?
For more information about MV limitations in general, please refer to the official documentation for limitation on working with MVs and limitations on Creating MV.?
References:
Disclaimer:
As my personal blog, any views or opinions, or advice represented in this blog are my personal views and belong solely to me.