ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

What makes Snowflake platform so damn cool?

Nick Akincilar

Analytics, AI & Cloud Data Architect | Solutions Whisperer | Tech Writer

å‘å¸ƒæ—¥æœŸ: 2020å¹´6æœˆ13æ—¥

At this point, if you work in data technology space and haven't heard of Snowflake, you are either living under a rock or all your technology news feeds have been going to your spam folder for the last 3 to 5 years. (in that case, you should really fix that!)

Snowflake is the talk of entire data market space for a reason and it is not because it is just another hype. So, what makes Snowflake the posterchild of data technology. Why does every vendor in the BI & Analytics space want to partner with it & everyone who deals with SQL & data wants to learn & work with it?

To see what makes Snowflake really special, you first have to understand how everything else operates. When I say everything else, I mean any solution in the market that serves data using SQL. Which is, the relational stuff you have been working with for the past 30+ years.

Regardless of what big RDMS vendor you used or heard before; they all pretty much operate the same way. Essentially all of your workloads(Ingestion, ETL, BI, Reporting, Data Science &, etc.) share a single compute cluster. Different vendors use different methods to operate & scale this cluster such as shared-disk or shared-nothing but in the end resulting solution is the same. All the workloads still end up having to share the same single compute cluster. Let me show you what I mean...

This is what a traditional data warehouse looks like regardless of which vendor you choose & whether they are on-prem or cloud-based solutions. It is a cluster of computer resources with fix amount of computing power and some form of storage designed to support all of your data workloads.

The real fun starts when all your different workloads try to connect to it & run queries(Select, Insert, Update &, etc.) simultaneously. To manage this scenario, you have split the total computing power between these workloads depending on the importance of each workload such as shown below.

This can be done using either an automated workload management feature(which doesn't always prioritize things well) or manage it manually by splitting the total compute power between the different workloads. However, this doesn't change the fact that you only have a fixed amount of computing power to work with.

Using workload management, you can always change & assign more compute resources to certain workloads as shown below but this also means you have to take away resources from other workloads at the same time.

This is the reason why big complex ETL jobs usually run once every at night in a batch fashion so the business users are not kicking & screaming during the day.

Remember!!! You can manage the workloads all you want but the total size of the cluster(yellow box) does not change unless you scale up the whole thing..

So what happens when one of the workloads increases just a little to a point where the total workload requirements slightly exceed the total capacity? Like shown here...

This is where Snowflake & traditional solutions start to split apart from each other. In traditional solutions, whether they are one of the big cloud data warehousing products or their on-prem versions, there is only one way to fix the problem and that is to scale UP.

I use the term UP loosely because some cloud vendors will literally scale UP by adding more compute to the cluster then re-distribute the data while others will scale horizontally by adding additional clusters then replicate the entire dataset to each one. The goal is to add more horsepower for faster parallel processing of the data. In the end, regardless of how they scale, you still end up with the same result.

A bigger cluster with more horsepower & sometimes more storage.

In most cases, each round of scaling up usually ends up doubling the horsepower & your total running costs $$$ at the same time. Once scaled up, it is usually a permanent change that goes on forever.

As a result, you get stuck paying for a bigger & more expensive cluster just to handle your peak usage for brief periods where some or most of the newly obtained growth is not needed the rest of the time and is waste of money for the duration of your reserved instance (1 to 3 years).

Some cloud vendors also try to remedy this problem by trying to scale up either automatically or by giving you an option to do it manually on-demand but most solutions are either too slow to react when it comes to scaling, they won't scale down on their own or the scaling process is business disruptive meaning it stops all running queries so, you can't scale during while users are using the system.

What usually happens in the end, is that most IT departments will do capacity planning ahead of time where they will guestimate the max peak usage and reserve a big enough cluster to run 24x7, so disruptive & slow scaling is not needed.

Another big problem with the traditional approach is that most workload demands are never the same and at times could require lots more horsepower than they normally utilize. Data science & ETL are prime examples of these where you may see major spikes in compute needs that are multiples of their regular usage pattern because a data scientist just decided to run a very complex query on last 5 years of data to train his model or you just received a new request to ingest a monster size data source & clean it up by the end of the week for a major project.

And who can forget concurrency... Your organization acquires a new company with 200 more sales reps & 30 more analysts who all need access to your data warehouse and mostly on Monday mornings. You get three times the queries out of nowhere which you didn't account for.

Now what? Do you cut down the data ingestion frequency & resources for data science guys or do you scale up & pay more for stuff that won't be utilized most of the time?

These are the exact challenges that most organizations face with whether they use on-prem or cloud-based data warehouses and these are the exact challenges that made Snowflake the poster child of data management space.

You need to scale up when you need it, scale down when you don't, and stop paying for things when no-one is querying anything. But this alone is not enough because no two workloads are the same and each has distinct compute requirements so one size fits all approach is not good enough. You need to be able to adjust & scale compute power per each workload separately where one workload does not mess with another one. No resource contention.

By the way, you need it in a way that each workload can be scaled up & down for faster performance while users are still using it in an instant & can also automatically scale horizontally immediately to handle thousands of users logging-in within seconds and scale down right away when they all log off. If they stop running queries and nothing is running, why do you have to pay for it? It should just automatically shut itself off if it is idle and if a user runs a query, it should automatically start so fast that the user doesn't even notice the servers were not running when he triggered the query.

Looking at the way that Snowflake handles workloads gives us a completely different picture. We see various compute resources dedicated to each workload, yet they all access the same single copy of the data simultaneously for both read & write operations. Can you say goodbye to data silos & data governance? No more having to make copies of data & trying to keep everything in synch like you would with data marts.

Being able to assign different pools of independent compute clusters simultaneously to the exact same single copy of the data for both read & write operations is in the heart of how Snowflake does things differently than anything else in the market.

Remember this well, as this is the main reason why you can do many awesome things with data such as data sharing which I will cover in my next article.

During the night, it may be that BI & reporting needs are very basic but you need to run large complex ETL jobs & an offshore data science group pecking away at the data.

When it is daytime during weekdays, your BI users are running complex queries from their Tableau & PowerBI dashboards but ETL is much less yet the DataScience group starts running queries against bigger datasets.

And what about Monday morning rush? All 2000 users rushing to their dashboards at the same time where Snowflake automatically starts replicating multiple sets of BI compute clusters to handle concurrency. At the same time, since you have not processed any data on Sunday, no ETL jobs are running which the ETL virtual warehouse pauses within mins of being idle and stops incurring any charges. And there is only minimal usage on the Data Science warehouse because this group has their weekly meetings in the morning.

This is basically what makes Snowflake the leading data platform and replacement for on-prem and cloud-based traditional solutions across thousands of large & small companies.

Doing all of these things in a mostly automated manner with near 0 administration is the name of the game when it comes to Snowflake

But wait, there is more... Did I mention, Snowflake also doesn't care what cloud vendor you use?

Azure, AWS, Google, we just don't care. Do you have a multi-cloud strategy where you don't want to be locked down to a single cloud provider or running a global operation where you need your data accessible via different cloud providers around the globe?

Snowflake is the only Cloud Data Platform where your data can be automatically synchronized around the globe & across different cloud vendors & regions giving you the ultimate flexibility.

After all, data is global & should never be locked into a single cloud provider. With Snowflake, data can reside across all the clouds where it can be consumed & shared with your internal users, customers & partners regardless of what cloud they might be using.

Well, this is Snowflake 101 for you. If you didn't know, now you know.

Spread the word & don't forget to give this article a LIKE if it was helpful so all your linked in buddies can see it as well.

Vasim C.

VP - Data Engineer at Morgan Stanley

4 å¹´

Thank you Nick, extremely informative.

èµž

å›žå¤

Monica McEwen

4 å¹´

Nick Akincilar- Great read. Thank you!

èµž

å›žå¤

Shyam Chaware

Strategic Accounts at MongoDB | Driving Growth | Powering GenAI applications

4 å¹´

Nick Akincilar I'll keep the adjectives to a minimum but you're one of the smartest guys I've worked with. Very well written article which explains differences between Snowflake and it's competitors succinctly. One question about this part - "All 2000 users rushing to their dashboards at the same time where Snowflake automatically starts replicating multiple sets of BI compute clusters to handle concurrency" - does it mean that every user running their Tableau or PBI dashboard interacts with a different copy of the dataset? How do you make sure all data is synced to the last second?

èµž

å›žå¤

Alyssa Murre

Real-time Data | Redis

4 å¹´

Blair Pierson Janina Roth Luma Kamel Phillip Gardner Laura McDonald

èµž

å›žå¤

1 æ¬¡å›žåº”

Daan Bakboord

Jouw partner bij het vertalen van de juiste data naar tijdige beslissingen | ?????????????????????? | ?????????????????? ?? ???????? ?????????????????? | Consulting Partner Data Management @ ????????????????

4 å¹´

This explains things very well Nick.

èµž

å›žå¤

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Nick Akincilarçš„æ›´å¤šæ–‡ç«

5 mins to create a Snowflake Self-Service Sandbox Environment

2022å¹´2æœˆ16æ—¥

5 mins to create a Snowflake Self-Service Sandbox Environment

I often get asked about how to deploy a self-service sandbox environment within a Snowflake account. After answeringâ€¦

3 æ¡è¯„è®º
Snowflake Data Engineering & Governance Decision Tree

2021å¹´12æœˆ2æ—¥

Snowflake Data Engineering & Governance Decision Tree

Here is a generalized data engineering & protection/governance decision tree when it comes to current and some privateâ€¦

12 æ¡è¯„è®º
Running Python Workloads on scalable Snowflake Compute clusters

2021å¹´11æœˆ23æ—¥

Running Python Workloads on scalable Snowflake Compute clusters

What do you do if you have an old & slow notebook, a 160 million row dataset containing customer reviews that you haveâ€¦

12 æ¡è¯„è®º
Snowflake, Blockbuster, Netflix, & YouTube...? What is DataCloud & Why it should matter to you?

2021å¹´6æœˆ17æ—¥

Snowflake, Blockbuster, Netflix, & YouTube...? What is DataCloud & Why it should matter to you?

Over the years, a big part of my job has been to turn highly technical concepts into things that anyone can understand.â€¦

9 æ¡è¯„è®º
What makes Snowflake's live data exchange so damn exciting?

2021å¹´4æœˆ9æ—¥

What makes Snowflake's live data exchange so damn exciting?

Here is another episode of me explaining another revolutionary Snowflake feature. This one is pretty special because Iâ€¦

4 æ¡è¯„è®º
If data platforms were cars, Snowflake would be...

2021å¹´4æœˆ1æ—¥

If data platforms were cars, Snowflake would be...

I know you are probably thinking what in the world cars have anything to do with Snowflake data platform..

10 æ¡è¯„è®º
Quick SQL DB to Snowflake Export Tool

2021å¹´2æœˆ10æ—¥

Quick SQL DB to Snowflake Export Tool

Since I started working for Snowflake & witnessed how it can totally change the way organizations use data to enhanceâ€¦

17 æ¡è¯„è®º
The Most Awesome New Snowflake features for Data Science & Engineering.

2021å¹´1æœˆ20æ—¥

The Most Awesome New Snowflake features for Data Science & Engineering.

Snowflake is a never-ending improvement machine and as part of this, we announced a number of new features that will beâ€¦

6 æ¡è¯„è®º
All the S#*!@(stuff) you CAN DO with Snowflake?

2020å¹´10æœˆ16æ—¥

All the S#*!@(stuff) you CAN DO with Snowflake?

Sequel to my previous article about all the S#*!@(stuff) you no longer have to do with Snowflake due to it's automatedâ€¦
All the S#*!@(stuff) you don't have to do with Snowflake

2020å¹´9æœˆ28æ—¥

All the S#*!@(stuff) you don't have to do with Snowflake

In case you were hibernating this whole time, Snowflake went from a startup company in 2012 to an absolute dataâ€¦

10 æ¡è¯„è®º

See all articles

What makes Snowflake platform so damn cool?

Nick Akincilar

Analytics, AI & Cloud Data Architect | Solutions Whisperer | Tech Writer

Nick Akincilarçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Snowflake

A Beginnerâ€™s Guide to Getting Started with Snowflake

Snowflake â€” AAMIR P

Snowflake is not a data warehouse

Azure Data and Power BI News (November 2022)

MDS Newsletter #39

SQL Server Big Data Clusters on Azure

STANDING ARMIES

Replace Your SSAS Capability in the Cloud with Modernized Analytics

Flatten Hierarchical(Nested) Json Data in Snowflake Vs Databricks

Nick Akincilarçš„æ›´å¤šæ–‡ç«

5 mins to create a Snowflake Self-Service Sandbox Environment

Snowflake Data Engineering & Governance Decision Tree

Running Python Workloads on scalable Snowflake Compute clusters

Snowflake, Blockbuster, Netflix, & YouTube...? What is DataCloud & Why it should matter to you?

What makes Snowflake's live data exchange so damn exciting?

If data platforms were cars, Snowflake would be...

Quick SQL DB to Snowflake Export Tool

The Most Awesome New Snowflake features for Data Science & Engineering.

All the S#*!@(stuff) you CAN DO with Snowflake?

All the S#*!@(stuff) you don't have to do with Snowflake

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Snowflake

A Beginnerâ€™s Guide to Getting Started with Snowflake

Snowflake â€” AAMIR P

Snowflake is not a data warehouse

Azure Data and Power BI News (November 2022)

MDS Newsletter #39

SQL Server Big Data Clusters on Azure

STANDING ARMIES

Replace Your SSAS Capability in the Cloud with Modernized Analytics

Flatten Hierarchical(Nested) Json Data in Snowflake Vs Databricks

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†