Should I prefer ELT over ETL?
ETL = Extract Transform Load
ELT = Extract Load Transform
BI = Business Intelligence
Over last two decades, ETL process was core of every BI solution. Using ETL once can simply retrieve the data from source database, transform it based on how downstream system wants to consume it and finally load the data in target database.
This process worked certainly, when you know well in advance how you want to consume data or in which format/structure you would like to keep into target data warehouse. Also when you had to keep database and ETL infrastructure resources separately.
Now a days when we are moving towards data intensive applications, each and every bit of data could be useful for organization. Without determining in advance how to consume and persist the data in data warehouse, businesses are interested in dumping the data (from wherever they can get) in centralize repository (sometimes called as Data Lake). Later figuring out how to consume it in meaning full manner. In order to resonate with this thought, focus is shifting from ETL to ELT
Another catalyst is the growing popularity of cloud data warehouses like Snowflake,Redshift which can scale on demand and provide extremely cheaper storage, one need not to think twice storing(load) the data and later provisioning(Transform) to support different use cases. On other hand these cloud data warehouses provides ease of querying semi structured data like JSON/XML
When to prefer ELT over ETL
- When you want to avoid uncertainty of determining downstream use cases well in advance
- To achieve faster implementation cycle and reduce operation overhead of ETL tool vendor management and support
- Leverage offerings of cloud databases such as extremely cheaper storage, compute and storage resource segregation etc.
- Reduce change management, defect fixing or application enhancement time cycles
- No dedicated ETL Infrastructure required: Savings in Total Cost of Ownership
Senior Machine Learning Engineer (Google Cloud Platform | AZURE | AWS | Data Science | AI | ML | MLOPS | Python | Spark | Kafka | Java | Scala | Spring Boot | DBT| Cloud Data Fusion| DataForm | Azure Synapse | SnowPark)
4 年Agreed !! But for real-time streaming analytics, we have to choose ETL on top of ELT.