Modern Data, AI and Analytics Platforms: Shining a light on major cost considerations
Alan Grogan, Global Microsoft Data Platform business leader for the world's #1 Microsoft delivery partner

Modern Data, AI and Analytics Platforms: Shining a light on major cost considerations

We just released a whitepaper that discusses how a modern data, analytics and AI platform needs to be far more effective and efficient beyond the basic data warehousing boundaries. It should enable users to not only deliver high value analytics in a governed structure, but to also function as the critical foundation for all AI applications as enterprise demands on AI grow.?Every CTO/ CIO/ CDO is being asked for more insights and faster time to value from data and AI applications built on a strong foundation are the future.

?

Currently, transformation of data, not query serving or ingestion, is where the vast majority of demand is on a data platform.?When you want a new forecasting report, additional transformation pipelines are triggered to deliver the desired insight. So, as demand on a data platform grows, we see organizations waking up to significant (vendor) cost explosions that can be easily traced back to the transformation (ELT) workloads of the data estate.

?

Architectures that have used a legacy warehouse design to perform both ELT and BI query serving will face exponential costs as AI becomes more prevalent.

?

I find myself fortunate that my career has evolved over the ‘Big Data’ era when the Big Data V’s were presented in almost every article concerning the topic. Big Data was a huge thing, it was everywhere. Though the term has lost its momentum and is in reality dead, my own feelings of fortune are because I would not appreciate the sheer scale of the interest AI right now if I could not index it to what happened at the peak of Big Data (c.2014).

?

No alt text provided for this image
Graphic 1


The level of business interest in Artificial Intelligence is approximately 6X that of 'Analytics', and 25X that of 'Big Data'. So how do we ensure this interest is not wasted?

As the old saying went, ‘nobody gets fired for buying IBM’, well in todays modern world one might playfully replace it with ‘nobody will get fired for building a foundation using a data, analytics and AI platform technology that is used by Amazon, Adobe, Microsoft and Apple' (note 1). I admit this does not roll off the tongue, so let me rephrase and hope that it lightheartedly sticks…

No alt text provided for this image

Delta, or Delta Lake, is an open-source storage framework that enables building a Lakehouse architecture. Databricks announced its lakehouse architecture in 2020 and has been a pioneer in this space. Though the technical concept of a Lakehouse surfaced prior to this, Databricks was the first Data Platform vendor to announce it had overcome many of the?technological barriers to enabling a production-ready and available Lakehouse architecture. It was made possible by Databricks innovating with a selection of opensource technologies alongside the Apache Spark framework, which was founded by Matei Zaharia , Co-Founder and Chief Technologist at Databricks. Delta Lake was born as the open-source storage framework that underpins the Lakehouse architecture.


A Lakehouse platform enables companies to rapidly deliver data, analytics, and AI solutions at up to 6x lower cost than non-lakehouse services whilst still providing better or equivalent performance. These savings have been demonstrated in third-party tests using industry-standard benchmarks for query TPC-DS, and TPC-DI and in real-world customer comparisons across a range of workloads. These results were corroborated by research from Barcelona Supercomputing Center (BSC), which frequently runs TPC-DS on popular data warehouses. BSC’s latest research benchmarked Databricks (a lakehouse based platform) and Snowflake (a hybrid data lake and warehouse - note 2) and found that Databricks was 2.7x faster and 12x better in terms of price performance. This result validated our thesis that data warehouses become prohibitively expensive as data size increases in production.


Thinking beyond TCO when considering a modern data, analytics and AI platform. Should you be using an on-premise or cloud data warehouse, they can fall short when faced with the needs of a modern platform in many critical areas:

?

  1. Inefficient Data Engineering and ETL:?Clients experience greater than 6X the costs for ETL workloads.?Forrester found in a study that business users were up to 25% more effective when switching to Databricks.
  2. Limited Real-Time event processing: Clients are unable to perform any transformations such as grouping, sinking, windowing, or ML on streaming data with some cloud data vendors.?Databricks streaming capability is built with structured streaming which opens up the same APIs for batch or stream processing and does not require copies of data.
  3. Data is locked into closed and proprietary format: With some vendors, our clients experience vendor lock-in requiring 3rd party integrations to enable ML & AI, and unstructured data is not supported.?This slows down or stops innovation and sharing of data insights within or outside the enterprise. Databricks’ open Delta format can work on any compute platform and there are no additional compute charges for sharing data.
  4. Data Sharing: Easy cross-platform sharing of data via Databricks Delta Sharing in open parquet format into any modern endpoint. Here the host Databricks platform (Azure, AWS and/or GCP) can connect and share to any modern data end-point via REST API (e.g. On-premise or cloud). This is not available in closed systems, which restricts a fluid and agile data product operating model.


I believe that a modern integrated data, analytics and AI platform is key to enabling the pace of transformation that enterprises now require to thrive, and not just survive.


Going one step beyond ETL and TCO to fragmentation and closed systems, it’s vitally important that CDOs, CTOs and CIOs review the potentially large number of components to support the capabilities in your Data, Analytics and AI platform architecture. For example, in a modern data platform, we expect to support delta sharing, security, integration, query processing, and storage. In addition, for use cases related to analytics, it should support data visualization, MLOps, data product marketplace, and near real-time streaming.


Moreover, organizations need to pay attention to data governance which comprises of four areas: data access control, data access audit, data lineage and data discovery. This is where Databricks Unity Catalog plays an important role. It helps unify data and AI assets, existing catalogs and provides governance across clouds. Some salient features of Unity Catalogs are:

  • Use storage credentials and grant access on user or group level, giving organizations more control.
  • It has three levels of hierarchies: catalog, schema and object which gives organizations more possibilities to grant access for each hierarchy separately.
  • For multiple workspaces (e.g., Dev, Test, UAT, Prod), permissions can be controlled from account level.
  • We can choose detailed permissions for particular users/groups.

Enterprises that choose a closed data platform also frequently require further investments in a complex set of third-party tools [where that platform does not natively support future changes in demand, such as AI] which brings added complexity, longer development lead times, lower RoI and, dare I say, greater vendor lock-in.


Greater vendor lock in(?), I hear you ask. Well, it’s a bit like Brexit. If you need to remove or restructure your multi-vendor platform that exists because its closed central cloud data warehouse core is not a standalone unified Data, Analytics and AI platform, then the complexity is greater. Just like the EU had to ratify the removal terms of the UK in every country, well in data architecture and procurement terms, it’s a lengthy process and more complex than the alternative, which would be to replace a single standalone data, analytics and AI platform. Or as Richard Branson eloquently said:


“Complexity is your enemy. Any fool can make something complicated. It is hard to keep things simple”


I hope you enjoyed this blog. Feel free to comment and start a discussion.


Acknowledgements and notes:

I want to thank many people in Avanade who have guided the whitepaper and my thoughts on this article, with special thanks to Daniel Materowski , Akhil Vangala , Alex Barbeau , Chintan Shah, PhD Eric Hausken , Timur Bulutcu and Thomas Kim . I am so proud to have you in our world class talented team.


1. https://www.delta.io

2. https://www.snowflake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/

3. Source of graphic 1: https://trends.google.com/trends/

?


Amine Benhamza

Solving Business Problems using Data & AI

1 年

Well put Alan !

James OConnor

Senior Vice President, Global Strategic Accounts

1 年

Great post Alan

Franco Patano

Strategic Data and AI Advisor

1 年

Thanks for sharing our results on tpc-di and sharing the repo to reproduce the results for everyone. We want to help everyone make the best choice for the future of their data and AI platform.

Timur Bulutcu

Manager Azure iCoE, Microsoft Certified Expert SAP on Azure at Avanade, Data and AI Expert. Post graduate from AI-ML for Business Apps program at Texas State University at Austin McCombs Business School.

1 年

Working alongside exceptional leaders such as Alan Grogan in Avanade's Data Platform team has been an incredible experience, as we have achieved remarkable milestones that help organizations to select the best data platform for their business. Avanade has data, analytics and AI (DAAI) platform with all required services and products. We’ve seen data engineers hands-on in the production environment in less than 12 hours. This is a 100% #azure native #dataplatform. ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了