Snowflake Materialized View Query Auto-Rewrite

Snowflake Materialized View Query Auto-Rewrite

In this blog, we discuss the backbone of Snowflake Materialized View(MV) as MV Query Auto-rewrite including the architecture of how MV Rewrite works, rewrite rules involved, and conclude with some tips and limitations.

Problem Statement

A materialized view is a pre-computed data set derived from a query specification (the SELECT in the view definition) and stored for later use.

Without materialized view query auto rewrite implemented, we will need to query against the materialized view explicitly. For example, a query against table1 below will not use v1.

create materialized view v1 as
    select * from table1 where column_1 between 100 and 400;

select * from table1 where column_1 between 200 and 300;        

The above query needs to be rewritten by the user as below to use the defined materialized view.?

select * from v1 where column_1 between 200 and 300;        

This is not scalable and error-prone as users need to tackle and analyze each query one by one and rewrite it if it is thought to be performing better with MVs. In many cases, it is not even practically possible if the queries are tool-generated.?

It’s important for the Snowflake Compiler/optimizer to rewrite the query to adopt the materialized view automatically under the hood.?

How does MV Query Automatical Rewrite Work?

In the query optimization phase during query compilation, we plug in materialized views query auto rewrite where we calculate and compare the cost of MV access path and base table access path, then choose the cheaper one as the access path selection.?

For a given query plan, we first find a query fragment that could be efficiently served by the existing materialized views, then we rewrite the query plan with the access plan fragment of the eligible and best choice of materialized views. As the simple example query plan illustrates below, the source query plan with table-based is rewritten into a query plan with a Materialized view-based access plan.?

MV Auto-Rewrite Overview

Materialized view Query rewrite optimization goes through 3 different major phases: the collecting phase, the enumerating phase, and the Ranking phase. Let’s dive into the 3 phases to illustrate how materialized view Rewrite works.?

In the collecting phase, we use the MV Rewrite group#1(the left branch of the bottom table scan) as an example, see the collecting phase flow chart below.

Collecting MVs

In the collecting phase, we focus on the following tasks:

  • Collecting the candidate MVs for the given table in the MV Rewrite Group #1. There are 3 candidate MVs in this table: mv1, mv2, and mv3.
  • Marking MVRewriteGroup #1 for the source plan fragment while traversing the Query Plan Node graph in a bottom-up fashion.?
  • Plan fragment has a lineage of TableScan - [Filter] - [GroupBy]

In the enumerating phase, we count which mv provides the logically equivalent query plan fragment to the source plan fragment.?

Enumerating MVs

Let’s take a look at the enumerating flow chart above. We go through the following steps:

  • Matching each candidate MV’s plan, comparing with the source plan fragment to check if the MV can produce a logically equivalent result. If not, we eliminate the MV option.?
  • Generating the compensated MV access plans for each remaining MV option from the previous step. if the MV needs some subsequent compensating filter/aggregation to provide an equivalent dataset with the source plan fragment, we generate the alternative compensated MV-accessing plans.

In the ranking phase, we go through each alternative compensated MV-accessing plan fragment and rank them based on their cost.

Ranking MVs

From the previous enumerating phase, there are 3 alternative plan fragments left, one with the original table scan, and 2 with MVs. Let’s see how we rank them.

  • First, we expand the MV for the alternative. If the MV is stale, we union the MV with the fresh data set delta of the base table.
  • Then, we apply the rewrite rules such as filter pushdown which provides the scan set pruning.? and we estimate the cost for each alternative. As you can see, the source plan fragment costs 10K, the plan fragment cost with the mv1 alternative is 6K, and the cost with the mv3 alternative is 600.
  • Finally, we pick the minimal-cost alternative which is the mv3 plan fragment, and plug in the query plan fragment, shown below.??

Final Plan for MV


MV Auto-Rewrite Rules

What are the rules to decide whether the MV rewrite plan is logically equivalent to the source plan??

Snowflake Optimizer is based on 2 methods to choose an access path:

  • Heuristic rules: Heuristic rules for MV Auto rewrite generally focus on the eligibility of using MVs instead of its base table. This can be described in the rest of the rules below.?
  • Cost-based optimization: Optimizer calculates the cost of each eligible access path and chooses the cheapest one.??

We have described the basic steps to choose the access path using MV Auto-Rewrite in the above session of MV Auto-rewrite architecture. The following rules may be applied to MV Access path decision.

Rules for Predicate Subsumption

This rule is applied to decide whether an MV can be used based on the subsuming nature of the query filter, which means the MV’s filter in its definition is a superset of the query filter condition. If not, the MV can not be used.?

We are going to take a look at 4 different cases.

Case 1: Conjunction Filter (AND)

Predicate Rule - Case 1

In this case #1, MV1 is defined on the filter (a = 1), and the query filter is the conjunction of (a = 1) and (b = 1). So MV’s filter is a superset of the query filter. As a result, the MV1 can be used for this query.?

  • Query Filter : a = 1 and b = 2
  • MV Filter: a = 1
  • Access Path: Choose this MV1 along with the filter (b = 2) for the next step.

Case 2: Disjunction Filter (OR)

Predicate Rule - Case 2

In this case #2, MV2 is defined on the disjunction filters (a = 1 or a = 2), and the query filter is (a = 1). So MV’s filter is a superset of the query filter. As a result, the MV2 can be used for this query.?

  • Query Filter: a = 1?
  • MV Filter : a = 1 or a = 2
  • Access Path: Choose this MV2 along with the filter (a = 1) for the next step.

Case 3: Range Filter?

Predicate Rule - Case 3

In this case #3, MV3 is defined on the range filters (a > 0), and the query filter is (a in (2,3)). So MV’s filter is a superset of the query filter. As a result, the MV3 can be used for this query.?

  • Query Filter: a in (2,3)?
  • MV Filter: a > 0
  • Access Path: Choose this MV3 along with the filter (a in (2,3)) for the next step.

Case 4: Combination of AND, OR, and Range filters?

Predicate Rule - Case 4

In this case #4, MV4 is defined on the range filters ( (a=1 or a=2) and (b is not null or c is not null) ), and the query filter is (a = 1 and b > 0). So MV’s filter is a superset of the query filter. As a result, the MV3 can be used for this query.?

  • Query Filter: a =1 and b > 0?
  • MV Filter: (a=1 or a=2) and (b is not null or c is not null)
  • Access Path: Choose this MV4 along with the filter (a = 1 and b > 0) for the next step.

Rules for Aggregate Subsumption

This rule is applied to decide whether an MV can be used based on the subsuming nature of the query aggregation projection, which means the MV’s aggregation in its definition is a superset of the query aggregation. If not, the MV can not be used.?

We are going to take a look at 4 different cases.

Case 1: Exact Aggregation Subset

Aggregation Rule - Case 1

In this case #1, MV1 is defined on the aggregation projection (sum(c1), sum(c2) group by c3), and the query aggregation projection is (sum(c1) group by c3). So MV’s aggregation projection is a superset of the query aggregation on the exact grouping key. As a result, the MV1 can be used for this query.?

  • Query Aggregation : sum(c1) group by c3
  • MV Aggregation: sum(c1), sum(c2) group by c3
  • Access Path: Choose this MV1 along with the projection (sum(c1)) for the next step.

Case 2: Extra Keys in MV?

Aggregation Rule - Case 2

In this case #2, MV2 is defined on the aggregation projection (count(c1), sum(c2) group by c3,c4), and the query aggregation projection is (count(c1) group by c3). So MV’s aggregation projection is a superset of the query aggregation with an extra 2nd grouping key c4. As a result, the MV2 can be used for this query.?

  • Query Aggregation : count(c1) group by c3
  • MV Aggregation: count(c1), sum(c2) group by c3,c4
  • Access Path: Choose this MV2 along with the projection (sum(count(c1)) group by c3 ) for the next step.

Case 3: Extra Keys in MV + Filter subsumption for grouping keys

Aggregation Rule - Case 3

In this case #3, MV3 is defined on the aggregation projection (sum(c1), sum(c2) group by c3,c4), and range filter on grouping key(c3 > 3), and the query has the aggregation projection (sum(c1) group by c3) and filter (c3 > 4 and c4 > 5). So both MV’s aggregation projection and its filter are a superset of the query aggregation and query filter. As a result, the MV3 can be used for this query.?

Access Path: Choose this MV3 along with filter (c3 > 4 and c4 > 5) and the aggregation (sum(sum(c1)) group by c3 ).

Case 4: Extra Keys in MV + Filter subsumption on aggregation keys

Aggregation Rule - Case 4

In this case #4, MV4 is defined on the aggregation projection (sum(c1), sum(c2) group by c3,c4), and range filter on aggregation key(c1 > 3), and the query has the aggregation projection (sum(c1) group by c3) and filter (c1 > 4). Even though both MV’s aggregation projection and its filter are a superset of the query aggregation and query filter, its filter is on aggregation key c1, which can produce different results.? As a result, the MV4 is not legible to be used for this query.?

Access Path: No match, Choose the base table to avoid the wrong result.?

There is also a compilation time rule that any MV query rewrite that may take longer than 50% of the base table access path ( non-MV path) will not be considered.?

In summary, the MV Auto-Rewrite rules are as follows:

  1. For Access Path, the logic of MV Rewrite should be applied at the leaf nodes (the bottommost part) of the query plan in order to change the access path from tables to MVs.
  2. For Predicate Subsumption, the rows that satisfy the conditions of the query should be a subset of the rows in MV.
  3. For Aggregate Subsumption, choose the MV from the available MVs, which has a superset data of query and returns a minimum scan-set after pruning.
  4. For MV Choice of multiple eligible MVs, the optimizer chooses the cheapest based on the cost.?
  5. Skip MVs if the compilation exceeds 50% of the usual compilation time of the base table.

MV Auto-Rewrite Tips & Limitations

Below is a short list of tips and limitations regarding MV Query auto-rewrite that are often overlooked.?

  1. Querying against MV directly gives you an access path of using the MV, in case querying on the base table doesn’t choose MV, so that you can compare the two access paths.
  2. Using EXPLAIN command to see whether existing MVs are chosen by the auto query rewrite.
  3. IN list predicate is equivalent to a series of OR expressions (Disjunction cases).
  4. Aggregate functions can NOT be nested.
  5. Aggregate functions used in complex expressions (e.g. (sum(salary)/10)) can only be used in the outer-most level of a query, not in a subquery or an in-line view.
  6. Window functions are not supported in MV.
  7. Non-deterministic functions are not supported in MV, such as CURRENT_TIMETSTAMP.
  8. Sequence with Flatten function is not supported as it may introduce non-deterministic results.
  9. The expression on the column may not be handled by MV auto-rewrite, for example,Case 1: MV has filter D >= ‘2019-10-10’. Query has filter YEAR(D) >= ‘2020’.Case 2: MV has filter YEAR(D) >= ‘2019’. Query has filter D > ‘2019-01-10’
  10. Most complex expressions on columns may not be handled by MV auto-rewrite, either on MV definition or on the query, such as regular expression, window functions, GEO functions, etc.

For more information about MV limitations in general, please refer to the official documentation for limitation on working with MVs and limitations on Creating MV.?


References:

https://docs.snowflake.com/en/user-guide/views-materialized.html?

https://docs.snowflake.com/en/user-guide/views-materialized#label-limitations-on-working-with-materialized-views/?

https://docs.snowflake.com/en/user-guide/views-materialized#limitations-on-creating-materialized-views?


Disclaimer:

As my personal blog, any views or opinions, or advice represented in this blog are my personal views and belong solely to me.

要查看或添加评论,请登录

Minzhen Yang的更多文章

社区洞察

其他会员也浏览了