How does Zero ETL differ from a traditional ETL approach?
Recently, AWS released Zero ETL at Re: Invent 2022, with Microsoft, Google, and data warehouse providers Snowflake and Data Bricks following suit.
What is it? A set of integrations that eliminates or minimizes the need to build ETL data pipelines. Zero ETL is a misnomer. A better way of describing what it does is "Zero EL."
How does Zero ETL differ from ETL? ETL, which stands for?extract, transform, and load,?is a data integration process combining data from multiple data sources into a single, consistent data store loaded into a data warehouse?or other target system.
How is this achieved? Zero-ETL has the transactional database do the data cleaning and normalization before automatically loading it into the data warehouse. It’s important to note that the data is still relatively raw.?This tight integration is possible because most zero-ETL architectures require the transactional database and data warehouse to be from the same cloud provider.
Pros: Reduced latency. Reduced or eliminated duplicate data storage. One less source of potential failure.
Cons: There needs to be more ability to customize how the data is treated during the ingestion phase—some vendor lock-in. It doesn't eliminate later transformations that may need to be done to make the data consumable.
I hope you enjoyed reading about this brief overview as much as I did writing it. Next week, I will write about more traditional data pipeline methodologies and their pros and cons, including ETL, ELT, EtLT, and reverse ETL.
Data Engg & MLOps for Risk models @ BMO | Scaling Data teams | ETL, Data Products, Data Platforms, MLOps, GenAI, Project management
10 个月Great article. Can you share some more detail on how zero-etl enables real-time dataflow from transactions to data warehouse? From your article as mentioned as “EL”, I am guessing it is not as useful as no data modelling transformation is applied.