The Microsoft Digital Data Transformation Journey
Happy New Year Everyone! Hope you all had a wonderful holiday filled with memorable moments and are feeling re-charged for a grand 2022! The pandemic period that we have navigated over the past 2 years and continue to navigate, has challenged, shaped, and evolved us forward in ways not imagined. It has taught us to lead and live every dimension of our lives with purpose and grit. Here's to a fantastic year ahead filled with positivity and progress!
Sharing learnings and fostering conversations for new learnings is a great way to start out the new year. There is no better way to start the new year for our team in Microsoft Digital than with this article to share a balcony view of our Enterprise Data Transformation journey. Microsoft Digital is Microsoft's IT organization. We build and operate the systems that power Microsoft's global business and operations, with an emphasis on digitally transforming our customer experiences, our employee experiences, and our internal operations. Our team is the Enterprise Data team in Microsoft Digital. We build and operate the Enterprise Data Foundations to responsibly democratize data for enterprise-wide data applications. We are charting and navigating a journey to transform the Enterprise Data Estate that powers Microsoft's global business and operations. We are 2.5 years into what has been an incredible journey of applied learnings, with many more ahead. The most amazing aspect of our journey has been the conversations that we have had with many of our global customers, marquee brands investing in their own data transformation journeys, to share our applied learnings and to learn from them in shaping our evolution. The intent of this article is to capture and share a balcony view of our data transformation journey. The investments and progress described in this article are the leadership of an amazing team, with material influence and inspiration from industry-wide data leaders.
Enterprise Data Transformation is a very dense topic. Thought leading architectural approaches such as Data Mesh , Data Fabric , and Data Hub are widely discussed and debated in context in the data-verse, each with sound credibility, and all of which have influenced our data transformation journey in Microsoft Digital. The goal for this writing is to share a balcony view of our data transformation considerations, macro architectural choices, and applied learnings. With a focus on presenting a wide balcony view, this article will not distill the depth beneath the range of topics introduced in this writing. We will publish further and deeper articles on the topics introduced in this balcony view. To also note is that this writing does not advocate a singular architectural approach or specific implementation technologies. Our motivation is to share applied practitioner learnings that are vendor and technology agnostic, and to foster dialog with anyone interested to learn more and share their learnings, as we navigate our journeys of continuous and adaptive learnings in powering Data Driven Innovation for our purposes and organizations. With these framing contexts set, let's dive right in!
The content to follow is structured in 2 sections. The first section will introduce the quality challenges that existed in our prior data estate and the opportunities that we identified for our data transformation journey. The second section will provide a balcony view of our data transformation blueprint, our investments, and the outcomes that we have progressed to date.
The Enterprise Data Opportunity
Data being an invaluable enterprise asset is broadly stated and generally well understood. Enterprises generate large volumes of data, want to generate and acquire tons more, and want their people and their systems to be able to access and use the data needed for transformative impact. The Enterprise Data Opportunity stated simply, is to Scale Responsible and Transformative Data Applications. Transformative Data Applications integrate insights and intelligence from connected enterprise-wide data to create delightful customer experiences, enhance employee productivity, and generate operational efficiencies. Enabling teams across an enterprise to do such responsibly and at scale is a quality opportunity and challenge.
Data is an invaluable and a sensitive asset. It is generally not directly usable in the raw and atomic forms that it is produced in. The following are top of mind considerations that impact the use of data:
Laxing on any of the above considerations can make the difference between the outcomes of data use being beneficial or detrimental. Scaling controls to address these considerations at the volume, variety, and growth velocity of an enterprise's data estate, is a fundamental to responsibly democratize data to scale enterprise-wide applications. This as we well know, is easier said than done :-)
As most Enterprises, we have a globally distributed and organically evolved / evolving data estate in Microsoft Digital. The illustration below is a macro zoom-out view of our enterprise data estate when we started our data transformation journey.
Readers in data teams (or) with an awareness of their enterprise data estates, should be able to see the concealed complexity in this simplistic view. The concealed complexity when we started our journey was the volume and the variety of each of the components and their constituent elements in the view. We had hundreds of data sources, data infrastructures, and data consuming teams generating, processing, and using several petabytes of data on a daily basis.
To scale responsible and transformative data applications, we had to unpack and determine how we could responsibly scale each of the fundamentals viz. data access and use, data storage and management, and data compute.
1. Scaling Responsible Data Access and Use
The following were the mechanisms for data access and use in our pre-transformation state:
These mechanisms are each and collectively non-scalable anti-patterns for an enterprise aspiring to responsibly scale transformative enterprise data applications.
The following are the opportunities that we identified to scale responsible data access and use:
2. Scaling Responsible Data Storage and Management
Data in an enterprise is distributed by virtue of the systems and operations that generate, process, and serve data being distributed. The core challenge in scaling responsible data storage and management is anchored in the organic evolution of enterprise systems and operations that process and serve data. Proliferated data copies and data pipelines are prevalent in most enterprises, posing challenges in not just maintaining the integrity of data ownership, but also in responsible data management.
The following were the top line data storage and management challenges in our pre-transformation state:
The following are the opportunities that we identified to scale responsible data storage and management:
3. Scaling Responsible Compute
Data is generated, processed, and used by compute. Compute is owned by teams across an enterprise, making it the most critical and the most challenging to scale responsibly.
The following were the top line compute challenges in our pre-transformation state:
The following are the opportunities that we identified to scale responsible compute:
The challenges and opportunities introduced in this section constitute the purpose of our data transformation journey and investments introduced in the next section.
Transforming the Microsoft Digital Enterprise Data Estate
This section presents a balcony view of our data transformation blueprint to address the challenges and opportunities in scaling responsible and transformative data applications.
Our data transformation journey has been and will continue to be one of continuous applied and adaptive learnings. Transforming data in an enterprise is a multi-dimensional adaptive leadership opportunity and challenge. It is not just a technology transformation journey. It is at its core a people and practices evolution with technology as an enabler. While there are incredible technology opportunities and challenges in transforming an enterprise data estate, even the best technical solutions will fall short without organizational alignment, people advocacy, and the evolution of data practices. Synergizing these multi-faceted dimensions and making progress is an incremental journey of adaptive leadership and applied learnings. We are 2.5 years into our journey here in Microsoft Digital, a period during which we have evolved 80% of our data estate to the blueprint introduced in this section. We are on path to a 100% and expect to be in state by the end of this calendar year.
This section will present a balcony view of our data transformation blueprint. Each of the topics introduced here could be a focused book. We will look to publish further and deeper articles on these topics, the transformative outcomes that we have progressed, and the overall softer dimension of navigating our organizational change management in evolving to the blueprint.
Our data transformation blueprint is based on the over-arching principle of scaling federated and transformative enterprise data applications with responsible enterprise data foundations. There are vibrant discussions and debates in the data-verse on federated versus centralized approaches to data transformation. Our point of view on this is to strike the essential balance and our data transformation blueprint is based on the following related principles:
Presented below and based on these principles, is a balcony view of our data transformation blueprint, the components of which are introduced in the following sections in the sequence as numbered in the view:
领英推荐
1. The Enterprise Data Governance Platform
"Do we know where all of our data exists, who is using which data, and for what purpose?" - a real question posed a few years ago by a senior leader at Microsoft, for which there was no all-encompassing answer at that point in time. With our Enterprise Data Governance Platform investment, we have been able to make good progress towards addressing the question, with an all-encompassing lens.
Forming a comprehensive view of an enterprise data estate is the first step in identifying the opportunities to responsibly scale data management and use. Our Enterprise Data Governance Platform is our shared data foundation for managing our Enterprise Data Estate. It is the single destination for enterprise-wide data owners and data consumers. Data owners use the platform to configure policies and manage the health of their data assets. Data consumers use the platform to discover, subscribe to, and access data in compliance with data owner, regulatory, and enterprise policies.
The capabilities of our Enterprise Data Governance Platform include:
Central to all capabilities is the notion of generating, capturing, and using metadata to its fullest in scaling data management with intelligent automation and essential human data steward controls.
Some examples of metadata driven intelligent automation to scale data management include:
Investing in an enterprise data governance platform and program, is the top data transformation investment. If there is one investment that you are looking to get started with, consider this as the first.
2. The Enterprise Connected Data Platform
Transformative applications of data commonly entail connecting data at big data scale from across enterprise domains to generate connected insights and intelligence as data products. Connected datasets, metrics, conformed dimensions, reports, data marts, analytics models, and ML/AI models are all examples of such data products. The data products are applied in cross-domain applications to enable transformative experiences and efficiencies.
The following are common norms for such data products and their applications:
Teams took to addressing these challenges by investing in big data compute and serving optimized infrastructure for their data products. Such infrastructure can be physical storage based, in-memory based, or hybrid, with all modalities needing data to be queried from the data sources, and the physical store and hybrid modalities needing data to be moved and stored.
The following are the top line challenges and inefficiencies with distributed investments by teams in such infrastructure:
Our approach and solution to addressing the requirements and the challenges in scaling enterprise-wide data products, is our Enterprise Connected Data Platform with the following top line capabilities:
We have migrated about 80% of our prior point big data compute and serving infrastructures (distributed data lakes, data warehouses, operational data stores) to our connected data platform since commencing our data transformation journey, generating material infrastructure efficiencies, reducing data copies, and enabling engineering capacity to be pivoted from duplicative data infrastructure investments to differentiated data applications. We are on path to having a 100% of our big data compute and serving on the platform by the end of this calendar year.
Our enterprise data governance platform and enterprise connected data platform are our core shared enterprise data foundations to scale responsible and transformative data applications.
3. Enterprise Data Compute and Serving
Our goal with compute and serving is to enable teams to be able to use their preferred modalities, while being compliant with enterprise data management and governance requirements.
Our compute modalities are Spark and SQL based with manifestations in development environments, Analytics tools, ML and AI services. The common practice is for teams to own and manage their preferred compute. All compute is registered, visible, and audited in our enterprise data governance platform.
For data product serving, we are incrementally navigating in the direction of enabling with and directly from our connected data platform wherever possible. And there will continue to be good cases to move and serve data products from edge systems for requirements such as mission critical latencies and offline scenarios. In such cases, our data management standard is to register the edge systems in our enterprise data governance platform for auditing and lineage tracing.
4. Enterprise Engineering Platforms
We have mature systems for engineering excellence that we have honed and applied over several years in developing and operating our software products. Applying these systems to incorporate parity rigors for our data products is an investment that we are currently focused on with our core data foundations in place.
Our top line engineering excellence priorities for our data products include:
5. Enterprise Data Programs
As mentioned at the outset, the best technical solutions for data transformation will fall short without organizational alignment, people advocacy, and the evolution of applied data practices. In addition to technology, we are also investing in the following enterprise data programs to scale company-wide data awareness and best practices in the responsible use of data:
Outcomes Progressed and The Journey Forward
The following are the top line outcomes that we have been able to progress to date in our data transformation journey:
As for our journey ahead, we have a lot more to get done in the period ahead. Our top line focus areas for 2022 are:
Sharing, Learning, and Evolving
We hope that you found this writing to be a useful read. We would super appreciate and value hearing your thoughts on our journey. We would also love to connect and share deeper learnings, as well as learn from your journeys and experiences. Drop us a line in the comments and we will get in touch with you. Let us also know if there are any topics introduced here that you would like a deeper dive on, to help us prioritize our next writings.
Data transformation is a vast topic and there is so much that we can learn from each other. Here's to a fantastic 2022 and to sharing, learning, and evolving our data transformation journeys for our causes and organizations!
Great article - I'll be pointing my customers at it.
Data Analytics & AI Specialist - Business & Technical
2 年This is indeed an awesome insight.
Innovation Lead | Chief Architect Data & AI for Intelligent Industry @ Capgemini | GTM Lead AUTO + MLS | Born at 342.53 ppm
2 年great article and insightful journey. Thanks for sharing!
Retired
2 年Very interesting write up. Thank you. Excellent.
Data&AI Solution Specialist Manager
2 年It's what we need. These learnings are very valuable. Many thanks for sharing. I love it!.