SNOWFLAKE A NEW ARCHITECTURE FOR DATAWAREHOUSE
Snowflake is the first analytic database built for the cloud, deliver it as a data warehouse as a service, it can run on most populer cloud providers, it is faster, flexible and easier to use compared to tradistional warehouse. It handles all aspects of Authenticaion, configuration, resource management, data protection, availability, optimization and more.
VIRTUAL DATA WAREHOUSE:
This is nothing but compute cluster, they are named after Virtual warehouse, but instead of machines Snowflake termed them as Warehouse. We can create Warehouse of different sizes depending upon our requirement it could be single node or multi node warehouse. These nodes are nothing but AWS EC2 instances but they are internal to warehouse and we don’t directly interact with them. Creating a warehouse does not have any cost associated with it is just metadata creation, and also we can have more than one warehouse configuration, the point is we create our compute resource definition and start it only when we want some computation and shut it done once it’s done.
A job that runs for 5 minutes on a 32 node warehouse and triggers 24 times a day is charging us for 2 hours of a 32 node warehouse, we can use the same warehouse to execute multiple concurrent queries.
In that case as the queries are submitted to a warehouse the warehouse allocates resources to each query and begins running them.If sufficient resources are not available to execute all the queries additional queries will be queued until the necessary resources become available.We have all the flexibility to plan the work load and reuse our warehouses
It also offers a unique idea of multi cluster warehouse for auto scaling.
For Example we created a multi cluster warehouse and started a query on the warehouse, snowflake would start our query normally in the same warehouse. However, if we start submitting more and more queries on the same warehouse at some point they will consume all the resource and additional queries will be queued until existing ones are complete, But a multi cluster warehouse will detect this scenario and automatically launch a new warehouse to execute the queued SQLs
Snowflake allows us to configure automatic scale up by starting additional warehouse and scale down by shutting down the warehouse depending upon the workload.
Snowflake tries to take the best of both shared-disk and shared nothing architecture, it has a single data and which means we don’t have to manage shadow copies and log syncing, and it is a single copy but we then have the ability to run up multiple independent scaled compute clusters and it is called as virtual warehouses, as it is built for a cloud fabric we don’t have to worry about the infrastructure challenges we have to deal with traditional architecture. It uses cloud as its storages platform, in fact it uses object storage for example if we are using AWS cloud stack it will store it in S3. For the end users we don’t need to worry about how it is stored, we don’t need to deal with the data stored in with format specifically designed for servicing high performance analytics queries.For example if we are using S3, S3 is very big so we can be able to scale our environment, we can start with scale of megabytes and we can grow our environment to the scale of petabytes , even if we have petabytes of data in single table it will still give us excellent amount performance for queries.One of the key element of the architecture is that the storage and the computing is done separately in snowflake architecture. As mentioned S3 is the storage element and the computing layers are just clusters of compute nodes, and any data is not stored in this compute nodes.It uses local storage as volatile caching to help accelerate query performance, which means it is able to start and stop these nodes independently of the storage which save cost.We can have multiple compute clusters which is called as virtual warehouses, we can have them working simultaneously over a single copy of the data held centrally
OTHER USE:
We can inject the semi structure data without applying the schemer at the time of the ingestion it has a data type called as Variant that allows to store JSON/AVRO/PARQUET/XML, we can store these complex data object into a single record
It has a capability called as continuous data protection, which means as we make any updates or alterations to a tables in the S3, it actually stores data in a series of little micro blocks, it breaks it up and it store individual blocks in much the same way the data is stored in a disk.S3 is treated as a big disk, as we do alterations it do not overwrite the old copy it rather write a new copy of the block and it keep the old one around for some time
It also have the ability called as zero copy clones, this allows to replicate data very quickly, if we clone it’s not a snapshot or replica it actually just making copies of the pointers to the objects in S3
It have the ability to provide sharing of data elements protected and controlled sharing of data elements across accounts
External stage, it will take data from out of snowflake it will store the data in snowflake for stage and for the next time if we need the data it not direct to the external storage as S3insted it will take the data from the stage, these external stage we can create through the command and we can run it through the console of snowflake,
CONCLUSION
Snowflake data warehouse has jumped constraints found in databases from earlier development and honed a promising cloud analytic database. Eminently elastic on a foundation of separation of compute and storage. After our finding we came to the conclusion that Snowflake is market-leading in what you would want for a multi-purpose cloud data warehouse/ analytical database.
Data and AI, Cloud Expert | Expert in Databricks, Snowflake, Microsoft Fabric, Azure OpenAI Service, AWS AI/ML Services, Google Cloud AI/ML Solutions, Power BI, and Power Platform,
5 年Sure. But most of the customers using snowflake are happy and good feedback.will do something at least with azure dwh
Data and AI, Cloud Expert | Expert in Databricks, Snowflake, Microsoft Fabric, Azure OpenAI Service, AWS AI/ML Services, Google Cloud AI/ML Solutions, Power BI, and Power Platform,
5 年Thanks for sharing. I want to do end to end demo on this and want to compare azure data warehouse and big query? Or you already did some comparison?