Building multi-purpose data warehouses

Building multi-purpose data warehouses

Statistically more data warehouse projects fail than succeed. Not that they technically fail but more projects fall short of realizing the business value and generating enough ROI that a CFO can be happy about.

One of the reasons (among many other) is limited vision and use cases for these projects. Most of these projects start with and stop at building custom and/or on-demand dashboards and data visualizations.

How many times we have seen the number of dashboards explode in number while their usage starts dwindling pretty fast ? More often than some of us in data world like to admit.

Building multi-purpose data warehouses that can deliver on diverse business use cases can be a good way to deliver better RoI for the business on these capital intensive data warehousing/ data lake/data lakehouse projects.

Here are possible 10 uses cases to consider while budgeting, architecting, building and rolling out to users.

1. BI Dashboards

Business intelligence dashboards is most classic use case for data warehouse and it continues to be so. This use case do not need much explanation.

2. Customer data shares

Many customers would like to get their data back into their systems. Usually this is serviced through slow and inflexible file transfers, expensive and limited APIs. In both scenarios any change in customer needs involves development effort and is slow to respond.?

Modern data-cloud platforms like @snowflake offer zero copy data shares to enable secure, fast and flexible way for customers to get the data they need.

3. Internal on-demand read access

Many non IT internal users depend on IT resources to pull data for their on-demand questions. This introduces delays, additional burden on IT resources and sense of frustration for users. Providing read access to curated datasets on data warehouse platform paves the path to self servicing internal data needs.

4. Transactional reports with large data volumes and aggressions

Transactional reports that crunch large volumes of data such as multi year financial data are better run on a data warehouse than on an OLTP database. They run cheaper and do not block user transactions.

5. Data science requirements

Most data science Al/ML projects need large curated data at one place for the models to be effective. In some cases AI teams need access to raw data quickly to experiment during model development. Data lakes/Lakehouses would be well positioned to serve those needs.

6. Central /consolidated data store

When multiple OLTP databases are deployed in production,? especially with single tenant models, it is often a challenge to get reports and insights that need the data together. Data warehouse can serve as the central datapoint for such consolidated? view.

7. Internal data analytics

Often departmental data for finance, HR, Marketing , sales sit in? different source systems such as Salesforce, Workday. Internal? analytics that need these datasets to be together can make use of centralized data warehouse. Modern ETL systems such as Fivetran come with hundreds of native connectors to these commercial systems to pull data without any programming effort.

8. Building data Apps/Products

Building monetizable data products is becoming a new use case and being made easy by modern data cloud platforms such as @snowflake.

9. Staging for data onboarding

This can be a bit unconventional use case. Many data onboarding processes especially those that deal with large files need to do data lookup for data quality and referential integrity checks such as if customer ID, product ID already exists or not. When such lookups are done on production OLTP systems, it can slow down online users.

All such data checks and lookups can happen on data warehouse and final update/insert operations can happen on production OLTP. One prerequisite for this is that the data warehouse should be up to date with transactional system.

10. Data archive store

Some industries have tiered data availability needs such as x years in OLTP and x+y years of data readily available for audit requirements. Storing that +'Y' yeas of data in data warehouse/data lake? can reduce large data volumes getting accumulating in OLTP databases that can make them slow and expensive.

Conclusion

Modern cloud native data platforms store data on inexpensive data storage systems such as AWS S3 and can provide SQL interface to retrieve that data.

Powerful, scalable data cloud platforms can handle large scale centralized data store with distributed compute based on the workload or need. Concern of different types of work loads (batch vs transactional)? clashing is not a major issue with these new data platforms.

Concern of large number of small customers impacting VIP customers or vice versa can also be avoided with separate compute clusters serving these needs.

With such liberating data-cloud platforms, we can think out of the box to unleash the data, get better ROI, reduce time to decision.

How else are you empowering your teams and customers with your data?


#data #datawarehouse #datalake #datalakehouse #analytics #snowflake #datacloud

Lakhan M

Digital Marketing Specialist

9 个月

A New Paradigm for Managing Data Download Now: https://tinyurl.com/yh7jxzxh #data #dataanalytics #datamanagement #bigdata #datascience #informationmanagement #databased #datadriven #analytics #datademocratization #dataculture #datagovernance #dataprivacy #datasecurity #dataethics #clouddata #hybriddata

Ankit B

Data-Driven B2B Marketer | Driving Business Success

9 个月

The Definitive Guide to the Data Lakehouse Download Now: https://tinyurl.com/422p2hse #datalake #data #DataLakehouse #DataManagement #BigData #DataWarehouse #DataIntegration #DataEngineering #DataScience #AIinData #TechInnovation #DataStorage

要查看或添加评论,请登录

社区洞察

其他会员也浏览了