Snowflake Cloud Data Platform
Snowflake is a unique cloud-based data warehouse SaaS solution which is built on top of AWS, GCP and Azure cloud infrastructure. With Snowflake, you no need to do any admin activities like install, configure, or manage any hardware or software. This SaaS Solution provides users with unprecedented performance, simplicity, concurrency and affordability. Snowflake offers number of services and it has dramatically changed the data landscape by eliminated the need to have separate systems for each of your workloads. Snowflake can be your Data warehouse, Data marts and Data lake.
Key Features and Benefits:
Automatic Clustering - Snowflake data warehouses support automatic clustering that helps independent compute clusters can read/write at the same time and resize instantly.
High Availability - It provides users with cross-cloud and cross-region seamless data sharing. Users can store data in multiple clouds with auto replication feature that helps to achieve High Availability when the whole cloud goes down, still Snowflake workloads will be up and running.
Data Sharing - Snowflake to migrate your data from one cloud platform to another by setting up the data replication which is bit expensive considering data ingress & egress. Snowflake supports both ETL and ELT processes.
Performance - It provides users with eleven 9’s of durability SLA by underlying cloud providers, therefore users may not even need backups. The biggest advantages of snowflake is there is no hard limit for simultaneous queries run at a time.
Security and Data Protection: Snowflake offers enhanced authentication by providing Multi-Factor Authentication (MFA), federal authentication and Single Sign-on (SSO) and OAuth. All the communication between the client and server is protected y TLS.
High-level Architecture :
Snowflake provided the ability to scale up and down, automatically on the fly that providing the exact performance needed, at the time needed. The Snowflake architecture scales storage and computing resources independently. Therefore, customers can use and pay for storage and computation separately. At the same time, your cloud data storage automatically grows without the need to add nodes without writing any code and you only pay for what you use: whether you need to scale your data storage up or down.
Architecture Components :
Snowflake architecture consists of three key layers namely Database Storage, Query Processing and Cloud Services.
Source: snowflake.com
Database Storage: Snowflake has a Scalable cloud blob storage type for storing structured and semi-structured data. The storage layer contains tables, schemas, databases, and diverse data. Tables can store as much as of petabytes of data and can have partitions effectively managed. Storage is made up of multiple micro partitions that scale automatically when required.
Query Processing: This layer is the compute layer of architecture that contains multiple virtual warehouses and every query runs on one virtual warehouse. Virtual Warehouses can be auto-resume, and auto suspend, easily scalable, has auto-scaling factor inbuilt.
Cloud Services: Cloud service is Stateless compute resource that runs on multiple availability zones and utilizes highly available and useful metadata. The service layer maintains, optimizes, transacts on data, provides security on data, metadata management and data sharing. This layer can independently scalable and also enables SQL client interface for DDL and DML like operations on data.
Comparing Cloud Data Warehouse Solutions:
AWS Redshift GCP BigQuery Snowflake