Snowflake is not a data warehouse

I heard so many times people are saying that Snowflake is a data warehouse. That is categorically incorrect. Snowflake is not a data warehouse. Not until you build a data warehouse on Snowflake. So you need to create the data warehouse, YOUR data warehouse on Snowflake platform. Yes, Snowflake is a platform. It is a data platform.

Likewise, Databricks, Redshift, Fabric and BigQuery is not a data warehouse. They are data platform, on which you can build your data warehouse.

As a data platform Snowflake has several abilities:

1. Enable us to build a data warehouse

2. Enable us to build a data lake

3. Enable us to build a data sharing platform

4. Enable us to build data science / AI applications

5. Enable us to build data applications

6. Enable us to do data engineering

Likewise, Databricks, Redshift, Fabric and BigQuery enable us to build all those things too.

Cloud data platform

I’ve been working with databases for over 20 years. In the past, the database was installed in our in-house servers. We had a server room in the building, where racks and cabinets were full of blade servers, switches, storage, etc. A data warehouse project means that I had to setup and install database servers and storage in that server room, configuring 2 servers so that when 1 failed, the other one automatically took over within seconds. On top of that we had to setup user authentication, disaster recovery, partitioning, indexing, and so on.

Today, with cloud data platform there is a big difference. When doing a data warehouse project, we do not install anything into any servers. We don’t install a database server. We don’t install load balancer (for that automatic failover). We just login to Snowflake using our browser, and everything is available to us out of the box. Database, user authentication, time travel, high availability, disaster recovery, row level security, governance, data privacy.

Likewise, Databricks, Redshift, Fabric and BigQuery also do the same thing. There is no install. You just login and use them. You get everything out of the box.

Understandably, people from the “old school” like me often think the old way. For example:

  • Database backup. What backup? There is no backup! You don’t backup your database. And you don’t restore them. So what if you accidentally dropped a table? Just type “UNDROP TABLE”. And your table will be restored as it was. Voila! What if you wanted to query a table as it was last week? Easy. Just add “AT {DATE}” on your SELECT statement. And you’ll get the content of that table as it was last week.
  • Partitioning big tables? There is no partitioning! You don’t need to partition your tables to get good query performance like in the old days. The tables are already partitioned (called “micro partition”) and automatically managed by Snowflake. You don’t need to do anything! In the old days, I spent a lot of time managing the partitions on SQL Server. It’s called “partition switching” and “sliding window”. Not today. It’s all automatic.
  • Clustering. In the old days, to get high availability, I had to setup a “cluster”. Meaning a group of 2 database servers (or 3, or 4). One of them is active, and one is on standby. When the active one crashed, the standby one became active automatically (called “failover”). With Snowflake you don’t need to do that. You get high availability out of the box. There is no down time. There is no crash. Behind the scene, your database is replicated across multiple data centers. If one data center becomes unavailable, it automatically failover to another data center. Amazing isn’t it?
  • Compression: On SQL Server, I used to do data compression in order to save space and improve query performance. Using something called columnstore, and page compression. Meaning that the data is stored per column not per row. In Snowflake, I don’t need to do that. Out of the box, Snowflake is a columnar database. It stores data per column. And it automatically compress all the data in your database to improve query performance and reduce storage cost.
  • Indexing: I used to spend a lot of time doing indexing. As you can imagine, in data warehousing the tables are pretty big. So indexing is key to get good query performance. And load performance too. And I had to update the statistics regularly (like every day) on big tables. Well not with Snowflake. There is no index. Snowflake distributes tables into micro-partitions and for each partition it collects the statistics on the data range for each column, loading only the partitions required to satisfy the query.

Likewise, Databricks, Redshift, Fabric and BigQuery also do the same thing. NOT! They are not the same. Some of them are still using partitioning, still using indexing, clustering and database backup too! Yup. Need to be careful when choosing a “cloud data platform”. Because not everyone does everything above automatically out of the box.

But yes with all of those “cloud data platform” you don’t install anything. You don’t install any hardware. You don’t install any software. You don’t install any patches either. In the old days, every year you had to “patch” your database server. A few times a year even. It’s called “updates”. There were database updates, there were antivirus updates, there were operating system updates, etc. Lots of updates. And then there were “upgrades”, when you installed the next version of Oracle (or SQL Server, etc.) Or the next version of Windows server. And guess what? You got down time! You had to make the databases not available for a few hours while you upgrade the server. You had to work weekends to avoid impact to the users.

With the cloud data platform, there are no patches, updates, or upgrades. It is all automatic. They install all the updates and upgrades for you without any down time. NOT! Again, not all of them are like that. Some platform you still have to do patches, updates or upgrades. So be careful when choosing a data platform. Not everyone of them is maintenance free.

Conclusion

Snowflake is not a data warehouse. It is a data platform. And it is an AI platform too (but this is for another article, another day). With cloud data platforms like Snowflake you don’t need to do database backup, table partitioning, clustering, compression, indexing. No need to install anything, no need to do patches, updates and upgrades. There are many cloud data platforms, like Databricks, Redshift, Fabric and BigQuery. But not everyone of them do everything automatically.

Andy Webster

NXT Data Cloud Platform & Data Analytics Engineer, Certified DBA MSc, BSc

4 周

Was this meant to be a joke? If so I apologize profusely

回复
Andy Webster

NXT Data Cloud Platform & Data Analytics Engineer, Certified DBA MSc, BSc

4 周

I apologize if I'm teaching anyone to suck eggs...

回复
Andy Webster

NXT Data Cloud Platform & Data Analytics Engineer, Certified DBA MSc, BSc

4 周

Databricks is a similar solution also subtly different as a Data Lakehouse, its shares common ideas but takes a different approach - they compete head to head, one has better Spark, one has better data sharing, one has a DWH metaphor the other kinda different. Again ask AI if you really don't know what you are talking about here too - sorry, please ignore me. Snowflake and Databricks experts, help me out here please?

Andy Webster

NXT Data Cloud Platform & Data Analytics Engineer, Certified DBA MSc, BSc

4 周

copilot what is Snowflake? Copilot Snowflake?is?a?cloud-based?data?platform?that?enables?organizations?to?store,?manage,?and?analyze?large?amounts?of?data1.?It?offers?a?fully?managed?service?that?allows?for?data?sharing,?collaboration,?and?analytics?across?multiple?cloud?environments2.?Snowflake?supports?structured,?semi-structured,?and?unstructured?data,?making?it?versatile?for?various?use?cases Some?key?features?of?Snowflake?include: Flexible?Architecture:?Supports?data?on- ... blah blah... Copilot, not a cloud data warehouse then? It?absolutely?is?a?cloud?data?warehouse!?Snowflake?is?actually?one?of?the?the-best cloud?data?warehouse?solutions?out?there.?Its?architecture?allows?for?the?separation?of?storage?and?compute,?making?it?highly?scalable?and?cost-efficient.?It?enables?organizations?to?store?vast?amounts?of?data?and?perform?complex?queries?across?multiple?cloud?providers.?So?yes,?Snowflake?is?indeed?a?cloud?data?warehouse?with?some?impressive?capabilities. Blah b;ah blah - if you can't be authentic, be a generalist perhaps? If you don't believe me ask ChatGPT, Google Gemini yourself or better still do proper research online. and Yes I know DWH, Data Mart and data modeling.

回复
Andy Webster

NXT Data Cloud Platform & Data Analytics Engineer, Certified DBA MSc, BSc

4 周

Really? not a Cloud Data Warehouse? Illuminate me, what exactly is it then, seriously I love to learn..

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了