SNOWFLAKE A NEW ARCHITECTURE FOR DATAWAREHOUSE

SNOWFLAKE A NEW ARCHITECTURE FOR DATAWAREHOUSE

Snowflake is the first analytic database built for the cloud, deliver it as a data warehouse as a service, it can run on most populer cloud providers, it is faster, flexible and easier to use compared to tradistional warehouse. It handles all aspects of Authenticaion, configuration, resource management, data protection, availability, optimization and more.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

VIRTUAL DATA WAREHOUSE:

No alt text provided for this image

This is nothing but compute cluster, they are named after Virtual warehouse, but instead of machines Snowflake termed them as Warehouse. We can create Warehouse of different sizes depending upon our requirement it could be single node or multi node warehouse. These nodes are nothing but AWS EC2 instances but they are internal to warehouse and we don’t directly interact with them. Creating a warehouse does not have any cost associated with it is just metadata creation, and also we can have more than one warehouse configuration, the point is we create our compute resource definition and start it only when we want some computation and shut it done once it’s done.

A job that runs for 5 minutes on a 32 node warehouse and triggers 24 times a day is charging us for 2 hours of a 32 node warehouse, we can use the same warehouse to execute multiple concurrent queries.

No alt text provided for this image

In that case as the queries are submitted to a warehouse the warehouse allocates resources to each query and begins running them.If sufficient resources are not available to execute all the queries additional queries will be queued until the necessary resources become available.We have all the flexibility to plan the work load and reuse our warehouses 

It also offers a unique idea of multi cluster warehouse for auto scaling.

No alt text provided for this image

For Example we created a multi cluster warehouse and started a query on the warehouse, snowflake would start our query normally in the same warehouse. However, if we start submitting more and more queries on the same warehouse at some point they will consume all the resource and additional queries will be queued until existing ones are complete, But a multi cluster warehouse will detect this scenario and automatically launch a new warehouse to execute the queued SQLs

Snowflake allows us to configure automatic scale up by starting additional warehouse and scale down by shutting down the warehouse depending upon the workload.

No alt text provided for this image

Snowflake tries to take the best of both shared-disk and shared nothing architecture, it has a single data and which means we don’t have to manage shadow copies and log syncing, and it is a single copy but we then have the ability to run up multiple independent scaled compute clusters and it is called as virtual warehouses, as it is built for a cloud fabric we don’t have to worry about the infrastructure challenges we have to deal with traditional architecture. It uses cloud as its storages platform, in fact it uses object storage for example if we are using AWS cloud stack it will store it in S3. For the end users we don’t need to worry about how it is stored, we don’t need to deal with the data stored in with format specifically designed for servicing high performance analytics queries.For example if we are using S3, S3 is very big so we can be able to scale our environment, we can start with scale of megabytes and we can grow our environment to the scale of petabytes , even if we have petabytes of data in single table it will still give us excellent amount performance for queries.One of the key element of the architecture is that the storage and the computing is done separately in snowflake architecture. As mentioned S3 is the storage element and the computing layers are just clusters of compute nodes, and any data is not stored in this compute nodes.It uses local storage as volatile caching to help accelerate query performance, which means it is able to start and stop these nodes independently of the storage which save cost.We can have multiple compute clusters which is called as virtual warehouses, we can have them working simultaneously over a single copy of the data held centrally 

No alt text provided for this image

OTHER USE:

We can inject the semi structure data without applying the schemer at the time of the ingestion it has a data type called as Variant that allows to store JSON/AVRO/PARQUET/XML, we can store these complex data object into a single record

It has a capability called as continuous data protection, which means as we make any updates or alterations to a tables in the S3, it actually stores data in a series of little micro blocks, it breaks it up and it store individual blocks in much the same way the data is stored in a disk.S3 is treated as a big disk, as we do alterations it do not overwrite the old copy it rather write a new copy of the block and it keep the old one around for some time

It also have the ability called as zero copy clones, this allows to replicate data very quickly, if we clone it’s not a snapshot or replica it actually just making copies of the pointers to the objects in S3

It have the ability to provide sharing of data elements protected and controlled sharing of data elements across accounts

External stage, it will take data from out of snowflake it will store the data in snowflake for stage and for the next time if we need the data it not direct to the external storage as S3insted it will take the data from the stage, these external stage we can create through the command and we can run it through the console of snowflake,

No alt text provided for this image

CONCLUSION

Snowflake data warehouse has jumped constraints found in databases from earlier development and honed a promising cloud analytic database. Eminently elastic on a foundation of separation of compute and storage. After our finding we came to the conclusion that Snowflake is market-leading in what you would want for a multi-purpose cloud data warehouse/ analytical database.

Amit Sharma

Data and AI, Cloud Expert | Expert in Databricks, Snowflake, Microsoft Fabric, Azure OpenAI Service, AWS AI/ML Services, Google Cloud AI/ML Solutions, Power BI, and Power Platform,

5 年

Sure. But most of the customers using snowflake are happy and good feedback.will do something at least with azure dwh

回复
Amit Sharma

Data and AI, Cloud Expert | Expert in Databricks, Snowflake, Microsoft Fabric, Azure OpenAI Service, AWS AI/ML Services, Google Cloud AI/ML Solutions, Power BI, and Power Platform,

5 年

Thanks for sharing. I want to do end to end demo on this and want to compare azure data warehouse and big query? Or you already did some comparison?

回复

要查看或添加评论,请登录

Mujeebuddin Syed的更多文章

  • You We and IoT!

    You We and IoT!

    Today the Internet is massive global network that allows people to communicates with each other, we send Email, instant…

    3 条评论
  • How businesses can get support in Cloud Technology & Big Data problem.

    How businesses can get support in Cloud Technology & Big Data problem.

    The growth of cloud computing is nothing short of astounding the market research firm International Data Corporation…

社区洞察

其他会员也浏览了