The ETL to ELT to EtLT Evolution, and data pipelines
Article 2 of 2
Originally published on our blog .
Click here for the previous article: Data warehouse, data lake, and the features of ETL and ELT
For many years, data warehouses with ETL and data lakes with ELT have evolved in parallel worlds. But recently, ELT processing has become common in organizations that augment their legacy ETL tools with newer, cheaper ELT processes to populate their data warehouse schemas — laying the foundation for the EtLT methodology.?
This change can play out over three phases. Can you tell which is in use at your organization today??
Phase 1: Sequence Data Lakes and Data Warehouses
A common approach to harnessing the flexibility and low cost of data lakes is to replace a costly legacy ETL tool with a data lake, and program transformations inside the data lake.? However, to reduce the impact on the business, a data warehouse remains in use. This results in a combination of the ETL and ELT patterns:
This combination of patterns is commonplace in many enterprises today. Unfortunately, it has become the source of complexity and inefficiency across data groups and sprawl across infrastructures.?
Phase 2: Consolidate ETL and ELT?
The costs of cloud data warehouses have dropped sufficiently to where the maintenance of a separate data lake makes less economic sense. In addition, some cloud data warehouses like Snowflake are expanding their features to match the diverse and flexible data processing methodologies of data lakes.?
As a result, transformation processing inside data warehouses is becoming more accessible, and maintaining a separate data lake is no longer needed. This is leading many enterprises to consolidate their data operations onto the newest generation of cloud-based data warehouses.
Meanwhile, some data lakes like Databricks are adding structures that match the ease of use of data warehouses, reducing the benefits of loading data into a separate data warehouse. In some organizations, these changes are causing a tug-of-war on whether to retire data warehouses altogether.?
领英推荐
Recognizing this convergence of data lakes and data warehouses, several vendors are now positioning their data platforms as “lakehouses,” all-in-one solutions capable of storing and processing any type of data.
The disintegration of the differences between the two patterns has led to a new balkanization of tools along the value chain instead:?
As a result, distinct tooling for ETL and ELT is disappearing, and the concept of loading (the “L” in ETL and ELT) has diminished altogether. Instead, the growing scale of data operations has given rise to a new set of concerns for which the balkanized tools are ill-suited:
Phase 3: The EtLT Approach
While ETL has its roots in the rigidity of early data warehouses, ELT was born from the flexibility of data lakes. However, with the immense growth in data generation and the complexities involved in handling it, even ELT is seeing an evolution.?
A new approach, EtLT — and even beyond, in its ongoing iterations of EtLTLTLT… — is surfacing in the data landscape. At its core, EtLT encapsulates "Extract, transform, Load, Transform," though we’ve also seen it called "Extract, tweak, Load, Transform." But, this is more than just a quirky sequence of letters. It's indicative of a deeper shift where data, once prepared, isn't static but is continuously refined and republished in a data catalog to respond to diverse and evolving needs. This dynamic and ever-evolving nature of the EtLT approach, with its emphasis on continuous refinement, is a pivotal reason why the concept of the data mesh has gained such traction and popularity.
Harnessing the Power of Data Pipelines
The EtLT approach stands out as the standard and prevailing demand in today's data space. However, to implement this approach effectively, one needs the right technology in place. As James Densmore highlights in the "Data Pipelines Pocket Reference 2021 ," data pipelines are integral for a successful EtLT deployment, to the extent that the efficient execution and management of the EtLT model are now synonymous with the concept of data pipelines.
While data pipelines have long been tools in the arsenal of data engineers, they are now re-emerging as the most adaptable and potent data processing architecture.?
Enterprises have an opportunity to undergo a metamorphosis. By rethinking traditional ETL as a mainstay of data handling, their data practice can evolve to accommodate data pipelines, and power up their use of both data lakes and data warehouses to tackle the complexities of their modern data landscapes.?