Catalyst Optimizer

Catalyst Optimizer

At the core of Spark SQL is the?Catalyst Optimizer. Let's explore its different phases.


  • Analysis
  • Logical Optimization
  • Physical Planning
  • Code Generation


Analysis

In this phase catalyst optimizer convert the Unresolved Logical Plan to Logical Plan by resolving all references using the catalog.

Resolving references means whether the provided column name is valid or not, type of the column is matching with the performed computation. Spark SQL uses catalyst rules & catalog object – that tracks the tables in data sources to resolve the attribute


Logical Optimization

In this stage catalyst optimizer convert the Logical Plan to Optimized Logical Plan by applying standard rule-based optimizations to the logical plan.?These include predicate pushdown, projection pruning etc.

Physical Planning

In this phase – Spark SQL takes a Optimized Logical Plan and generates one or more Physical Plans using physical operators that match the Spark execution engine. Later Spark select the best plan using a cost model.

Here Spark also performs rule-based physical optimizations, such as filters into one Spark map operation. Also, it can push operations from the logical plan into data sources that support predicate or projection pushdown.

Code Generation

At the end, Spark only deals with the lower level construct i.e. RDD (Resilient Distributed Datasets). In this final phase catalyst optimizer generate bytecode that will run on each machine on these in-memory datasets efficiently.


要查看或添加评论,请登录

Arabinda Mohapatra的更多文章

  • A Deep Dive into Caching Strategies in Snowflake

    A Deep Dive into Caching Strategies in Snowflake

    What is Caching? Caching is a technique used to store the results of previously executed queries or frequently accessed…

  • A Deep Dive into Snowflake External Tables: AUTO_REFRESH and PATTERN Explained

    A Deep Dive into Snowflake External Tables: AUTO_REFRESH and PATTERN Explained

    An external table is a Snowflake feature that allows you to query data stored in an external stage as if the data were…

  • Apache Iceberg

    Apache Iceberg

    Apache Iceberg Apache Iceberg is an open-source table format designed to handle large-scale analytic datasets…

  • Deep Dive into Snowflake: Analyzing Storage and Credit Consumption

    Deep Dive into Snowflake: Analyzing Storage and Credit Consumption

    1. Table Storage Metrics select TABLE_SCHEMA,TABLE_CATALOG AS"DB",TABLE_SCHEMA, TABLE_NAME,sum(ACTIVE_BYTES) +…

    1 条评论
  • Continuous Data Ingestion Using Snowpipe in Snowflake for Amazon S3

    Continuous Data Ingestion Using Snowpipe in Snowflake for Amazon S3

    USE WAREHOUSE LRN; USE DATABASE LRN_DB; USE SCHEMA LEARNING; ---Create a Table in snowflake as per the source data…

    1 条评论
  • Data Loading with Snowflake's COPY INTO Command-Table

    Data Loading with Snowflake's COPY INTO Command-Table

    Snowflake's COPY INTO command is a powerful tool for data professionals, streamlining the process of loading data from…

  • SNOW-SQL in SNOWFLAKE

    SNOW-SQL in SNOWFLAKE

    SnowSQL is a command-line tool designed by Snowflake to interact with Snowflake databases. It allows users to execute…

  • Stages in Snowflake

    Stages in Snowflake

    Stages in Snowflake play a crucial role in data loading and unloading processes. They serve as intermediary storage…

  • Snowflake Tips

    Snowflake Tips

    ??Tip 1: Use the USE statement to switch between warehouses Instead of specifying the warehouse name in every query…

  • SnowFlake

    SnowFlake

    ??What is a Virtual Warehouse in Snowflake? ??A Virtual Warehouse in Snowflake is a cluster of compute resources that…

社区洞察

其他会员也浏览了