Modern Data, AI and Analytics Platforms: Shining a light on major cost considerations
Alan Grogan
Executive & advisory board member | Avanade, Accenture, KBS | Data & AI business leader
We just released a whitepaper that discusses how a modern data, analytics and AI platform needs to be far more effective and efficient beyond the basic data warehousing boundaries. It should enable users to not only deliver high value analytics in a governed structure, but to also function as the critical foundation for all AI applications as enterprise demands on AI grow.?Every CTO/ CIO/ CDO is being asked for more insights and faster time to value from data and AI applications built on a strong foundation are the future.
?
Currently, transformation of data, not query serving or ingestion, is where the vast majority of demand is on a data platform.?When you want a new forecasting report, additional transformation pipelines are triggered to deliver the desired insight. So, as demand on a data platform grows, we see organizations waking up to significant (vendor) cost explosions that can be easily traced back to the transformation (ELT) workloads of the data estate.
?
Architectures that have used a legacy warehouse design to perform both ELT and BI query serving will face exponential costs as AI becomes more prevalent.
?
I find myself fortunate that my career has evolved over the ‘Big Data’ era when the Big Data V’s were presented in almost every article concerning the topic. Big Data was a huge thing, it was everywhere. Though the term has lost its momentum and is in reality dead, my own feelings of fortune are because I would not appreciate the sheer scale of the interest AI right now if I could not index it to what happened at the peak of Big Data (c.2014).
?
The level of business interest in Artificial Intelligence is approximately 6X that of 'Analytics', and 25X that of 'Big Data'. So how do we ensure this interest is not wasted?
As the old saying went, ‘nobody gets fired for buying IBM’, well in todays modern world one might playfully replace it with ‘nobody will get fired for building a foundation using a data, analytics and AI platform technology that is used by Amazon, Adobe, Microsoft and Apple' (note 1). I admit this does not roll off the tongue, so let me rephrase and hope that it lightheartedly sticks…
Delta, or Delta Lake, is an open-source storage framework that enables building a Lakehouse architecture. Databricks announced its lakehouse architecture in 2020 and has been a pioneer in this space. Though the technical concept of a Lakehouse surfaced prior to this, Databricks was the first Data Platform vendor to announce it had overcome many of the?technological barriers to enabling a production-ready and available Lakehouse architecture. It was made possible by Databricks innovating with a selection of opensource technologies alongside the Apache Spark framework, which was founded by Matei Zaharia , Co-Founder and Chief Technologist at Databricks. Delta Lake was born as the open-source storage framework that underpins the Lakehouse architecture.
A Lakehouse platform enables companies to rapidly deliver data, analytics, and AI solutions at up to 6x lower cost than non-lakehouse services whilst still providing better or equivalent performance. These savings have been demonstrated in third-party tests using industry-standard benchmarks for query TPC-DS, and TPC-DI and in real-world customer comparisons across a range of workloads. These results were corroborated by research from Barcelona Supercomputing Center (BSC), which frequently runs TPC-DS on popular data warehouses. BSC’s latest research benchmarked Databricks (a lakehouse based platform) and Snowflake (a hybrid data lake and warehouse - note 2) and found that Databricks was 2.7x faster and 12x better in terms of price performance. This result validated our thesis that data warehouses become prohibitively expensive as data size increases in production.
Thinking beyond TCO when considering a modern data, analytics and AI platform. Should you be using an on-premise or cloud data warehouse, they can fall short when faced with the needs of a modern platform in many critical areas:
?
领英推荐
I believe that a modern integrated data, analytics and AI platform is key to enabling the pace of transformation that enterprises now require to thrive, and not just survive.
Going one step beyond ETL and TCO to fragmentation and closed systems, it’s vitally important that CDOs, CTOs and CIOs review the potentially large number of components to support the capabilities in your Data, Analytics and AI platform architecture. For example, in a modern data platform, we expect to support delta sharing, security, integration, query processing, and storage. In addition, for use cases related to analytics, it should support data visualization, MLOps, data product marketplace, and near real-time streaming.
Moreover, organizations need to pay attention to data governance which comprises of four areas: data access control, data access audit, data lineage and data discovery. This is where Databricks Unity Catalog plays an important role. It helps unify data and AI assets, existing catalogs and provides governance across clouds. Some salient features of Unity Catalogs are:
Enterprises that choose a closed data platform also frequently require further investments in a complex set of third-party tools [where that platform does not natively support future changes in demand, such as AI] which brings added complexity, longer development lead times, lower RoI and, dare I say, greater vendor lock-in.
Greater vendor lock in(?), I hear you ask. Well, it’s a bit like Brexit. If you need to remove or restructure your multi-vendor platform that exists because its closed central cloud data warehouse core is not a standalone unified Data, Analytics and AI platform, then the complexity is greater. Just like the EU had to ratify the removal terms of the UK in every country, well in data architecture and procurement terms, it’s a lengthy process and more complex than the alternative, which would be to replace a single standalone data, analytics and AI platform. Or as Richard Branson eloquently said:
“Complexity is your enemy. Any fool can make something complicated. It is hard to keep things simple”
I hope you enjoyed this blog. Feel free to comment and start a discussion.
Acknowledgements and notes:
I want to thank many people in Avanade who have guided the whitepaper and my thoughts on this article, with special thanks to Daniel Materowski , Akhil Vangala , Alex Barbeau , Chintan Shah, PhD Eric Hausken , Timur Bulutcu and Thomas Kim . I am so proud to have you in our world class talented team.
3. Source of graphic 1: https://trends.google.com/trends/
?
Solving Business Problems using Data & AI
1 年Well put Alan !
Senior Vice President, Global Strategic Accounts
1 年Great post Alan
Strategic Data and AI Advisor
1 年Thanks for sharing our results on tpc-di and sharing the repo to reproduce the results for everyone. We want to help everyone make the best choice for the future of their data and AI platform.
Manager Azure iCoE, Microsoft Certified Expert SAP on Azure at Avanade, Data and AI Expert. Post graduate from AI-ML for Business Apps program at Texas State University at Austin McCombs Business School.
1 年Working alongside exceptional leaders such as Alan Grogan in Avanade's Data Platform team has been an incredible experience, as we have achieved remarkable milestones that help organizations to select the best data platform for their business. Avanade has data, analytics and AI (DAAI) platform with all required services and products. We’ve seen data engineers hands-on in the production environment in less than 12 hours. This is a 100% #azure native #dataplatform. ??