Rise of the LakeHouse Architecture
Modern Data Platforms have come a long way in trying to create a feasible Data Architecture. Initially it started with creating a Data Lake and then extract a Data Warehouse for reporting, meaning having two data storages. Soon organizations realized that this architecture is not feasible, costly affair of maintaining two storages and not suitable for modern data processing needs. To add further, it brings burden of implementing data management and governance on two storages separately. This adds costs, complexity, and hampers TTM. See fig 1 for traditional architecture with Data Lake and DWH.
???????????????????????????????????
A data Lakehouse is a new data storage architecture that combines the flexibility of data lakes and the data management of data warehouses. Data can be stored in a single location and is suitable for all workload types: ML, Analytics and Streaming. This reduces the overhead of implementing Data Management aspects such as DQ, DG, DO, Security, DataOps across different storages, resulting in low costs & efforts and accelerated TTM. Some of the key features of Lakehouse Architecture are ACID transactions support, raw or unstructured data support, unified batch & real time data pipelines and decoupled storage & compute. ?Both Databricks and Snowflake have emerged as preferred platforms that provide end to end data management tools to implement to build an effective Lakehouse Architecture. Pretty much all the third-party tools & technologies in the market for security, data quality, cataloging knowledge management etc integrates seamlessly with both Databricks & Snowflake. For instance, you might want to make a central semantic layer using Stardog Knowledge Graph. Stardog connects seamlessly with both Databricks and Snowflake. See fig 2 and fig 3 depicting Lakehouse architecture using Databricks & Snowflake respectively.
?
领英推荐
????????????????????????????????????????????????????????
?Question arises- which is better platform – Databricks or Snowflake? Well, both are great platforms and stand neck to neck against each other. ?See fig 4 below for comparison based on few main architecture principles:
????????????????????????????????????????????
?
Conclusion: Modern Lakehouse Architecture is much optimized as compared to the traditional architecture with both Data Lake and DWH in terms of cost, development efforts, agility, and capability to meet modern data processing needs. Both Databricks & Snowflake are great platforms to implement Lakehouse Architecture. However, Databricks is good choice for wide variety of use cases as it supports all analytical, ML and Streaming Workloads while Snowflake is an ideal choice for analytical workloads because of its simplicity.