SNOWFLAKE ARCHITECTURE
Pic credit-Umesh Patel(medium)

SNOWFLAKE ARCHITECTURE

???????? ???? ??????????????????

Snowflake is an evolutionary modern data platform that solves the scalability problem. Compared to traditional cloud data platform architectures, Snowflake enables data storage and processing that is significantly faster, easier to use, and more affordable. In addition, Snowflake's Data Cloud provides users with a unique experience by combining a new SQL query engine with an innovative architecture designed and built, from the ground up, specifically for the cloud.


SnowFlake Features:

Some of the key features of Snowflake include:

  1. Cloud-based data warehousing: Snowflake is a fully cloud-based platform that offers on-demand data warehousing, enabling organizations to quickly and easily spin up new data warehousing resources as needed.
  2. Scalability: Snowflake's architecture is designed to be highly scalable, allowing organizations to easily scale their compute and storage resources up or down as needed.
  3. Security: Snowflake provides comprehensive security features, including encryption at rest and in transit, network security, and user and role-based access control.
  4. Data sharing: Snowflake's data sharing capabilities allow organizations to securely share data with partners, customers, and other stakeholders.
  5. Data integration: Snowflake supports a wide range of data integration options, including ETL/ELT, data pipelines, and connectors to other cloud-based and on-premises data sources.
  6. Analytics and visualization: Snowflake includes built-in support for popular analytics and visualization tools, including Tableau, Looker, and Power BI.
  7. Performance: Snowflake's columnar storage and query optimization capabilities enable fast query performance, even on large and complex datasets.


Snow Flake Architecture


One of the key features of Snowflake is its unique architecture, which separates compute and storage, allowing for more flexible and scalable resource allocation. Snowflake also includes advanced security features, including encryption at rest and in transit, role-based access control, and network security.


No alt text provided for this image
SnowFlake Architecture


As we can see from the above diagram,The Snowflake architecture consists of three main layers:


Database Storage When data is loaded into Snowflake, Snowflake reorganizes it into its internal optimized, compressed columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. However, the data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.


Query Processing: Query execution is performed in the processing layer. Snowflake processes queries using "virtual warehouses". Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.


Cloud Services: The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.


Services managed in this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control


Why Snowflake is fast :


Snowflake is fast for several reasons:


Separation of Storage and Compute: Snowflake separates storage and computes, allowing for independent resource scaling. Data is stored in cloud-based object storage, such as Amazon S3 or Azure Blob Storage, while Snowflake provides compute resources on demand. This separation of storage and computing allows Snowflake to process queries and data more quickly and efficiently.


Columnar Storage: Snowflake uses columnar storage, optimized for analytic queries. Columnar storage is more efficient than row-based storage for analytic workloads because it only reads the required data for the query, reducing the amount of data that needs to be processed.


Multi-Cluster Architecture: Snowflake's multi-cluster architecture allows multiple compute clusters to access the same data without data replication or movement, it provides faster and more efficient data processing.


Automatic Optimization: Snowflake automatically optimizes the use of compute resources based on workload and usage patterns. This provides better performance and reduces costs.


Advanced Query Processing: Snowflake uses advanced query processing techniques, such as vectorized query execution, which allows for the execution of multiple operations in parallel. This provides faster query execution and processing times.


Caching: Snowflake caches frequently accessed data in memory, reducing the time it takes to access and process the data.


Caching in Snowflake can be implemented in different ways, depending on the specific use case and requirements. Some of the caching techniques used in Snowflake include:

Result caching: This involves caching the results of frequently executed queries so that they can be retrieved faster in subsequent executions. Result caching can help to reduce query latency and improve overall query performance.

Metadata caching: This involves caching the metadata of frequently accessed tables and views so that they can be retrieved faster in subsequent queries. Metadata caching can help to reduce the time it takes to compile and execute queries.

Query acceleration: This involves using specialized hardware, such as graphics processing units (GPUs) or field-programmable gate arrays (FPGAs), to accelerate query processing. Query acceleration can help to reduce query latency and improve overall query performance.

Materialized views: This invxolves creating pre-aggregated views of frequently accessed data so that they can be retrieved faster in subsequent queries. Materialized views can help to reduce the time it takes to execute complex queries.


Overall, caching techniques can be an effective way to improve the performance and efficiency of Snowflake data warehousing solutions, especially for frequently accessed data and queries.


How Snowflake stores data?

Data stored in Snowflake databases is always compressed and encrypted. Snowflake takes care of managing every aspect of how the data is stored. Snowflake automatically organizes stored data into micro-partitions, an optimized, immutable, compressed columnar format which is encrypted using AES-256 encryption.

Snowflake optimizes and compresses data to make metadata extraction and query processing easier and more efficient. We learned earlier in the chapter that whenever a user submits a Snowflake query, that query will be sent to the cloud services optimizer before being sent to the compute layer for processing.


Two unique features :


Zero-copy cloning allows the user to snapshot a Snowflake database, schema, or table along with its associated data. There is no additional storage charge until changes are made to the cloned object, because zero-copy data cloning is a metadata-only operation. For example, if you clone a database and then add a new table or delete some rows from a cloned table, at that point storage charges would be assessed. There are many uses for zero-copy cloning other than creating a backup. Most often, zero-copy clones will be used to support development and test environments.


Time Travel allows you to restore a previous version of a database, table, or schema. This is an incredibly helpful feature that gives you an opportunity to fix previous edits that were done incorrectly or restore items deleted in error. With Time Travel, you can also back up data from different points in the past by combining the Time Travel feature with the clone feature, or you can perform a simple query of a database object that no longer exists. How far back you can go into the past depends on a few different factors


Overall, Snowflake is a powerful platform for modern data warehousing, providing organizations with the tools and capabilities needed to manage and analyze large volumes of data in a secure and efficient manner.

KRISHNAN NARAYANAN

Sales Associate at Microsoft

1 年

Great opportunity

回复
srilakshmi Kancharla

Data Engineer at Natwest Group| Python | Pyspark | AWS | Hive | AWS Glue Catalog | SQL | EMR | MS SQL | Tableau | Athena

1 年

Very helpful

回复
Veer singh

Land Surveyor at Veer enterprises

1 年

Hlo sir

回复
Krishna Kollepara

Cloud Solution Architect /Security/Machine learning

1 年

Rocky Bhatia comparison with databricks would also be helpful.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了