SNOWFLAKE ARCHITECTURE
Rocky Bhatia
350k+ Followers Across Social Media | Architect @ Adobe | LinkedIn Top 1% | Global Speaker | 152k+ Instagram | YouTube Content Creator"
???????? ???? ??????????????????
Snowflake is an evolutionary modern data platform that solves the scalability problem. Compared to traditional cloud data platform architectures, Snowflake enables data storage and processing that is significantly faster, easier to use, and more affordable. In addition, Snowflake's Data Cloud provides users with a unique experience by combining a new SQL query engine with an innovative architecture designed and built, from the ground up, specifically for the cloud.
SnowFlake Features:
Some of the key features of Snowflake include:
Snow Flake Architecture
One of the key features of Snowflake is its unique architecture, which separates compute and storage, allowing for more flexible and scalable resource allocation. Snowflake also includes advanced security features, including encryption at rest and in transit, role-based access control, and network security.
As we can see from the above diagram,The Snowflake architecture consists of three main layers:
Database Storage When data is loaded into Snowflake, Snowflake reorganizes it into its internal optimized, compressed columnar format. Snowflake stores this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. However, the data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
Query Processing: Query execution is performed in the processing layer. Snowflake processes queries using "virtual warehouses". Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.
Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
Cloud Services: The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.
Services managed in this layer include:
Why Snowflake is fast :
Snowflake is fast for several reasons:
Separation of Storage and Compute: Snowflake separates storage and computes, allowing for independent resource scaling. Data is stored in cloud-based object storage, such as Amazon S3 or Azure Blob Storage, while Snowflake provides compute resources on demand. This separation of storage and computing allows Snowflake to process queries and data more quickly and efficiently.
领英推荐
Columnar Storage: Snowflake uses columnar storage, optimized for analytic queries. Columnar storage is more efficient than row-based storage for analytic workloads because it only reads the required data for the query, reducing the amount of data that needs to be processed.
Multi-Cluster Architecture: Snowflake's multi-cluster architecture allows multiple compute clusters to access the same data without data replication or movement, it provides faster and more efficient data processing.
Automatic Optimization: Snowflake automatically optimizes the use of compute resources based on workload and usage patterns. This provides better performance and reduces costs.
Advanced Query Processing: Snowflake uses advanced query processing techniques, such as vectorized query execution, which allows for the execution of multiple operations in parallel. This provides faster query execution and processing times.
Caching: Snowflake caches frequently accessed data in memory, reducing the time it takes to access and process the data.
Caching in Snowflake can be implemented in different ways, depending on the specific use case and requirements. Some of the caching techniques used in Snowflake include:
Result caching: This involves caching the results of frequently executed queries so that they can be retrieved faster in subsequent executions. Result caching can help to reduce query latency and improve overall query performance.
Metadata caching: This involves caching the metadata of frequently accessed tables and views so that they can be retrieved faster in subsequent queries. Metadata caching can help to reduce the time it takes to compile and execute queries.
Query acceleration: This involves using specialized hardware, such as graphics processing units (GPUs) or field-programmable gate arrays (FPGAs), to accelerate query processing. Query acceleration can help to reduce query latency and improve overall query performance.
Materialized views: This invxolves creating pre-aggregated views of frequently accessed data so that they can be retrieved faster in subsequent queries. Materialized views can help to reduce the time it takes to execute complex queries.
Overall, caching techniques can be an effective way to improve the performance and efficiency of Snowflake data warehousing solutions, especially for frequently accessed data and queries.
How Snowflake stores data?
Data stored in Snowflake databases is always compressed and encrypted. Snowflake takes care of managing every aspect of how the data is stored. Snowflake automatically organizes stored data into micro-partitions, an optimized, immutable, compressed columnar format which is encrypted using AES-256 encryption.
Snowflake optimizes and compresses data to make metadata extraction and query processing easier and more efficient. We learned earlier in the chapter that whenever a user submits a Snowflake query, that query will be sent to the cloud services optimizer before being sent to the compute layer for processing.
Two unique features :
Zero-copy cloning allows the user to snapshot a Snowflake database, schema, or table along with its associated data. There is no additional storage charge until changes are made to the cloned object, because zero-copy data cloning is a metadata-only operation. For example, if you clone a database and then add a new table or delete some rows from a cloned table, at that point storage charges would be assessed. There are many uses for zero-copy cloning other than creating a backup. Most often, zero-copy clones will be used to support development and test environments.
Time Travel allows you to restore a previous version of a database, table, or schema. This is an incredibly helpful feature that gives you an opportunity to fix previous edits that were done incorrectly or restore items deleted in error. With Time Travel, you can also back up data from different points in the past by combining the Time Travel feature with the clone feature, or you can perform a simple query of a database object that no longer exists. How far back you can go into the past depends on a few different factors
Overall, Snowflake is a powerful platform for modern data warehousing, providing organizations with the tools and capabilities needed to manage and analyze large volumes of data in a secure and efficient manner.
Sales Associate at Microsoft
1 年Great opportunity
Data Engineer at Natwest Group| Python | Pyspark | AWS | Hive | AWS Glue Catalog | SQL | EMR | MS SQL | Tableau | Athena
1 年Very helpful
Land Surveyor at Veer enterprises
1 年Hlo sir
Cloud Solution Architect /Security/Machine learning
1 年Rocky Bhatia comparison with databricks would also be helpful.