I recently managed a couple of transformational programs for large organizations, to establish modern data platforms.
With these Snowflake implementations, can appreciate the advantages of having a Data Lake alongwith the Data warehouse in the modern cloud data platform architecture.
The following gains can be observed: -
- Clear Segregation of the duties between Data Ingestion engineers (ingestion focus) and Analytics engineers (business value focused).
- Not all the data needs to be propagated from Snowflake-Data lake layer to the Snowflake Data warehouse layer.
- Ingest data from potentially useful tables from source applications into the Data Lake. Further investments can be postponed as the need/demand for curation crystallizes.
- Storage cost is low
- 1:1 replication. Same as source.
- Data availability lends itself to be used for adhoc analytics/ special projects. Once the value is proven further curation can be done to create a governed layer in Data warehouse.
- Enhances agility – shortens lead time to demonstrate/establish value.
- Promotes data driven culture (coupled with data discovery supported by data catalogue)
- Exploration, evaluation focused.
- Targeted at power users, business analysts, special projects
- Source for the Data warehouse, data science use cases
- Main cost driver: Storage volume
3. Propagation of data downstream from Data Lake layer to the Data warehouse layer could be demand driven.
- Investments only for proven use cases where there is higher confidence on potential value extraction.
- Highly governed layer – Single source of truth
- Clear ownership of the artifacts by the (preferably) business stakehoders.
- Exposed for consumtion by the larger business consumers and established downstream applications (Advanced analytics, applications etc)
- Data can be shared in controlled and sescure way to partners (customers, vendors or 3rd parties)
- Promotes data driven culture (coupled with data discovery supported by data catalogue)
- Productionization of high potential or established use cases.
- Foundational for establishing Self Service capabilities.
- Main cost driver: Compute associated with extracting value from the data (transformation and consumption)
Design, development guidelines are extremely important. All the more, when you are working with multiple and variety of source applications that (various SAP, Non SAP, custom applications etc) feed data for analytics. Laying a solid foundation along with templatization at the earliest, helps create oppurtunities to reduce cost and improve speed for subsequent deliveries.
In case you are considering or embarking upon such an endeavor, feel free to message me.
I will be happy to connect and share experience with your leadership team.
Finance Technologist- FP&A Solutions, Consolidated Financial Statements, Analytics, Data to Value
8 个月Very good insights and quite informative too. Thank you.
Management Trainee at TATA Voltas Limited (UPBG Division). || Ex-Business Development Intern at GWC.Ai || PGDM XIME-C 24' || Event Head_XSEED, XCOPS Operations club ||
1 年Brilliant insights into the power of Data Lake integration ??