Snowflake Shorts: Identifying Patterns and Trends from Queries
Snowflake Data Superhero - Divyansh Saxena

Snowflake Shorts: Identifying Patterns and Trends from Queries

If you are a data enthusiast, I am sure that you must have been in a phase where you spent hours identifying queries with similar execution plans to understand the optimization scope. Snowflake now has a feature in PuPr, allowing us to identify patterns and trends from queries. Let's discuss more about it in today's snowflake shorts.


Introduction of New Columns

query_hash and query_parameterized_hash are new output columns in the ACCOUNT_USAGE views and in the output of INFORMATION_SCHEMA table functions. These columns contain a hash of the query text. You can use this hash to analyze repeated queries.

These columns are available in the following views and in the output of the following table functions:

  • QUERY_HISTORY
  • TASK_HISTORY
  • QUERY_ACCELERATION_ELIGIBLE

As of today, they are part of the behavior bundle release 2023_06. You can enable it with the below query:

SELECT SYSTEM$ENABLE_BEHAVIOR_CHANGE_BUNDLE('2023_06');        

Understanding QUERY_HASH

The query_hash column contains a hash value that is computed, based on the canonicalized text of the SQL statement. Repeated queries that have exactly the same query text have the same query_hash values.

For instance, the below queries will have the same QUERY_HASH

SELECT * FROM TEST_DB.SF_SCH_203.CUSTOMER WHERE     NAME = 'DIVYANSH';        
SELECT * from test_db.sf_sch_203.customer where name= 'DIVYANSH';        

Queries will have the same hash if their text differs only in:

  • Case insensitive identifier, session variable, and stage name
  • White Space
  • Comments


Understanding QUERY_PARAMETERIZED_HASH

query_parameterized_hash contains a hash value that is computed based on the parameterized query, which means the version of the query after the literals are parameterized. These literals must be used in the query predicate and must be used with one of the following comparison operators:

  • = (equal to)
  • != (not equal to)
  • >= (greater than or equal to)
  • <= (less than or equal to)

For below 2 queries, the query_parameterized_hash will be the same:

SELECT * FROM TEST_DB.SF_SCH_203.CUSTOMER WHERE  NAME = 'DIVYANSH';        
SELECT * FROM TEST_DB.SF_SCH_203.CUSTOMER WHERE     NAME = 'PAUL';        

Queries will have the same parameterized hash if their text differs only in:

  • Case insensitive identifier, session variable, and stage name
  • White Space
  • Comments


Important Things to Note

Over time, the logic used by Snowflake to generate the query hash can change. Changes to this logic can result in different hashes produced for the same query.

The views and table function output that include the query_hash and query_parameterized_hash columns also include the following columns that specify the version of the logic used to produce the hashes:

  • query_hash_version
  • query_parameterized_hash_version

If these columns contain different version numbers for different periods of time, you can use these version columns to identify the different hashes for the same query.


About Me:

Hi there! I am Divyansh Saxena

I am an experienced Data Engineer with a proven track record of success in Snowflake Data Cloud technology. Highly skilled in designing, implementing, and maintaining data pipelines, ETL workflows, and data warehousing solutions. Possessing advanced knowledge of Snowflake’s features and functionality, I am a Snowflake Data superhero & and Snowflake Snowpro Core SME. With a major career in Snowflake Data Cloud, I have a deep understanding of cloud-native data architecture and can leverage it to deliver high-performing, scalable, and secure data solutions.

Follow me on Medium for regular updates on Snowflake Best Practices and other trending topics:




Sneha K ?

Data Engineer @ IBM | Microsoft Certified, Cloud Computing, Data Management

1 年

This inspires me to get one snowpro certificate ?? .

要查看或添加评论,请登录

Divyansh Saxena的更多文章

社区洞察

其他会员也浏览了