What does it take to be a data-driven organization?

What does it take to be a data-driven organization?

According to PwC, AI will contribute up to $15.7 trillion to the global economy by 2030. The potential of AI to revolutionize human lives and work is enormous, but it's essential to develop effective strategies to manage the complexity associated with it. This includes addressing governance challenges, having a solid data strategy foundation plan, scaling, and model management while considering social and environmental impacts. Failure to establish strong foundations to manage these challenges can lead to numerous setbacks, such as delays in delivering business value and increased costs.

Let's delve into strategies that can help companies become genuinely data-driven by leveraging AI technologies to drive scalable and well-governed businesses.

Promote Data-first culture

Imagine you’re trying to turn a boat around. Simply focusing on the engine won’t work. You need to consider all the factors, like wind, current, and direction of travel. Similarly, investing in the right tools or technologies is just the beginning. The key to success is an organization-wide shift in culture, with change management playing a critical role in becoming a truly data-driven company.

Companies with data-driven cultures tend to have a mindset that sets an expectation that decisions must be anchored around data — that this is normal, not novel or exceptional. It requires employees to be data-literate and to have access to the data they need to do their jobs effectively. It also requires a commitment to continuous learning and experimentation.

Fostering a data-first culture requires the following steps:

  • A data-first culture needs to be embraced by leadership to be successful. Leaders must show they are committed to data-driven decision-making and value the insights data can provide.
  • Data should be accessible and easy to use for everyone.
  • Invest in data literacy training to ensure everyone in the organization has the skills and knowledge needed to work with data.
  • Encourage collaboration and sharing among team members to facilitate the exchange of ideas and knowledge.
  • Adopt a product-oriented approach, which involves putting business stakeholders, data practitioners, and developers on the same team.
  • Be patient. Changing the culture takes time, and it is important to remain persistent in efforts to foster a data-first culture.
  • Cultivate a culture of experimentation where new ideas can be tested and refined.
  • Recognize and celebrate successes along the way to maintain momentum and motivation.

It's important to know how to measure the success of an organization-wide data-driven culture initiative. One way to do this is to analyze the teams' approaches over time to drive business value. The figure below illustrates this concept.

Differing levels of analytical maturity in action [1].

At a minimum, organizations should provide their employees with the necessary tools and support to operate on an insight level.

Platform architecture

Data is the core of digital transformation and should be treated as a valuable strategic asset that can help to improve processes, identify seasonal trends, enhance customer experience, detect unexpected spikes in sales, and much more.

It is proven that data-driven companies innovate faster and have a competitive advantage where data platform architecture plays an important role. With the exponential growth of data, new platform architectures have been introduced, as shown below [2].

Evolution of data platform architectures to today’s two-tier model (a-b) and the Lakehouse model (c)

A lakehouse is a unified data architecture that combines the flexibility and scalability of a data lake with the performance and ACID transactions of a data warehouse. This architecture enables organizations to store, manage, and analyze all data types in a single location, including structured, semi-structured, and unstructured data. It's important to note that there is no one-size-fits-all solution when choosing a data platform architecture. Experts must assess an organization's data landscape to identify the best solution. However, it's crucial that the architecture design selected can effectively store, manage, and accelerate advanced analytics, as this remains a critical aspect of any successful AI strategy.

Data and AI governance framework

Generative AI and Large Language Models (LLMs) are changing how organizations create content, simulate scenarios, and make decisions. However, these advanced technologies raise concerns about data privacy, bias mitigation, and ethical considerations. Having solid data and AI governance is crucial when using these technologies. Gartner predicts that by 2025, 80% of organizations expanding digitally will face obstacles due to outdated data and analytics governance practices. The figure below displays the key areas for tackling data and AI governance.

Key Areas of Data and AI Governance Challenges [3].

Data is the key to digital transformation, so data governance is essential for any organization that wants to protect and use data responsibly. Organizations consider unified platform architecture like Lakehouse to simplify governance [3]. This shift aims to move away from segregated environments with separate governance controls and towards unified platforms that make it easier to understand and protect data and AI models. An effective data governance framework should be able to answer questions about:

  • Data ownership and access: Who owns the data? Who has access to it? How is access controlled?
  • Data security: How is the data protected from unauthorized access, use, disclosure, disruption, modification, or destruction?
  • Data privacy: How is the data protected from unauthorized disclosure? How are individuals’ privacy rights respected?
  • Data quality: Is the data accurate, complete, consistent, and timely?
  • Data lineage: How is the data generated, processed, and stored? How can we trace the provenance of the data?
  • Data usage: How is the data used? Who is using it? For what purposes?
  • Data retention: How long is the data retained? When should it be deleted?
  • Data compliance: Does the data comply with all applicable laws and regulations?

Organizations must apply the guiding principles of accountability, standardization, compliance, quality, and transparency to govern AI effectively. AI governance frameworks should address all the potential risks and allow an organization’s teams to leverage AI’s full potential to drive innovation and achieve their business goals. An effective AI governance framework should be able to answer questions about:

  • Transparency and accountability: Who is responsible for developing, deploying, and using AI systems? How are AI decisions made? How are AI systems monitored and audited?
  • Fairness and equity: Do AI systems make fair and unbiased decisions? Are AI systems designed to be inclusive and accessible to all users?
  • Privacy and security: How is the privacy of user data protected? How are AI systems secured from cyberattacks and other threats?
  • Safety and reliability: Are AI systems safe and reliable? How are AI systems tested and validated before they are deployed?
  • Societal and ethical considerations: What are AI systems’ potential social and ethical implications? How can AI systems be used to benefit society and avoid harm?

Creating a robust data and AI governance framework is crucial to achieving success in advanced analytics while mitigating future safety, privacy, and compliance risks.

Treating data as a product

Since data is consumed across organizations by several teams, it is beneficial for organizations to treat datasets as products. This involves having a product owner who understands the customers and users, identifies the problems they are trying to solve, decides how to market the product, and ensures that it is reliable and valuable. The data product owner aims to deliver data that meets the highest standards and has the following characteristics:

  • Accuracy: The data must be accurate and free from errors. This is essential for producing reliable analytics results.
  • Completeness: The data should be complete, with no missing values. This will ensure that your analytics models are trained on a comprehensive dataset and can produce accurate predictions.
  • Consistency: The data should be consistent across all sources and systems. This will ensure that your analytics results are consistent.
  • Timeliness: The data should be timely and up-to-date. This is important for ensuring that your analytics results are relevant and actionable.
  • Relevancy: The data should be relevant to your business decisions. This will help you to identify the most critical insights and patterns in your data.

In addition to these characteristics, data should also be:

  • Trustworthy: The data should be from trusted sources and verified for accuracy and completeness.
  • Secure: The data should be properly secured to protect it from unauthorized access and modification.
  • Governed: The data should be governed by policies and procedures ensuring proper use and management.
  • Ownership: The team with expertise in a specific domain should own its data.

One way to achieve this is to implement a Data Mesh on Lakehouse to remove the need to copy data to multiple analytical systems and integrate multiple analytical workloads [4].

Data Mesh is A decentralised appraoch to data product development.

Data mesh is a decentralized architecture approach that produces trusted, reusable data products. The four fundamental principles of data mesh are:

  • Domain-oriented decentralized data ownership and architecture
  • Data as a product
  • Self-serve data infrastructure as a platform
  • Federated computational data governance

Data mesh helps organizations get the most out of their data, leading to achieving data economy. This allows teams to improve and enrich existing data, train machine learning models on more diverse datasets, and combine data from across the organization to find new insights, build better products, and drive innovation.

Categorization of data usage maturity and its corresponding organizational impact [1].

Decouple platform and data ownership

The traditional approach of a centralized team owning all of the data analytics for an organization has several points of failure. For instance, this team could face resource constraints, making delivering data requests on time challenging. This may lead other teams to create their infrastructure, which can compromise data governance and collaboration. Therefore, the role of data platform teams should be more central. Instead of being tightly coupled with product teams, data platform teams should operate more independently and focus on providing central support for data tools, templates, support systems, and governance practices.

The value-driving work happening across the organization is supported by centrally-driven initiatives that make work more accessible [1].

Product teams should focus on their core competencies to achieve effective and efficient data product development while letting data platform teams handle the platform requirements and user journeys. Data platform teams can focus on creating an environment that supports data product teams by providing a portfolio of tools and reusable assets to improve efficiency and effectiveness.

Establish a Center of Excellence (CoE) for AI

Organizations adopting AI can significantly benefit from establishing AI Centers of Excellence (CoE). By bringing together AI experts and creating a collaborative environment, AI CoEs can help develop and implement best practices and provide training and support to accelerate the adoption of AI.

AI CoE key focus areas

An AI CoE can be a central team that provides a clear overview of the organization's AI landscape. It offers recommendations on tools, technologies, required skills, and strategies. However, every investment in AI requires key performance indicators (KPIs) to measure success. In this case, the KPIs could include use case identification, delivery time, and the impact of AI solutions. An AI CoE should focus on key areas and KPIs to ensure the success of the organization's AI initiatives.

Key areas:

  • AI strategy: The AI CoE should develop and implement an AI strategy that aligns with the organization’s overall business goals. The strategy should identify the organization’s AI priorities, the types of AI solutions that will be developed and adopted, and the resources that will be needed.
  • AI governance: The AI CoE should develop and implement AI governance policies and procedures to ensure that AI is developed and used responsibly and ethically. These policies and procedures should cover data privacy, security, and bias mitigation.
  • AI tools and technologies: The AI CoE should evaluate and select the AI tools and technologies that are needed to develop and deploy AI solutions. The CoE should also provide training and support to employees on how to use these tools and technologies.
  • AI talent and skills: The AI CoE should develop and implement a plan to attract and retain AI talent. The CoE should also provide training and support to employees to help them develop the skills they need to work with AI.
  • AI adoption and impact: The AI CoE should track and measure the adoption and impact of AI solutions across the organization. This information can be used to identify areas where AI has the greatest impact and make necessary adjustments to the AI strategy.

KPIs:

  • Number of AI use cases identified and developed
  • Time to deploy AI solutions
  • Adoption rate of AI solutions
  • Impact of AI solutions on business outcomes
  • Number of AI projects completed on time and on budget
  • Employee satisfaction with AI training and support
  • Level of AI expertise in the organization

Understanding that the AI CoE should operate in collaboration is essential. It should collaborate closely with other departments and teams within the organization to create and execute AI solutions that cater to the business's requirements. The AI CoE should also maintain transparency and inform the organization about its activities.

Promote low-cost and sustainable development

Organizations that depend on data increasingly turn to artificial intelligence (AI) to enhance their decision-making abilities and streamline operations. However, creating and implementing AI solutions require a lot of computational power and energy, which can have a huge environmental impact.

The total amount of computing in petaflop/s-days [5].

Here are some ways that data-driven organizations can think more sustainably and be cost-effective:

  • Using efficient and sustainable data centers: Data centers are responsible for a significant portion of the world’s energy consumption. Data-driven organizations can reduce their environmental impact by choosing to use data centers powered by renewable energy and committed to energy efficiency.
  • Optimize AI workloads: AI models can be made more efficient using quantization and pruning techniques. Organizations should work with AI developers to optimize their models to reduce resource requirements.
  • Choosing the right AI tools: Various AI tools are available, each with its strengths and weaknesses. Hence, organizations should choose AI tools well-suited to their specific needs and optimized for efficiency.
  • Train AI developers on sustainable practices: AI developers should be trained in sustainable practices, such as designing and developing efficient AI models.

Here are some fundamental strategies to promote sustainable AI in a large organization:

Promoting sustainable development [6].

#Data #DataMesh #DataDriven #AI #COE


References

[1] Tekiner, F., & Bak, J. (2023, March 23). New whitepaper explores three pillars of a modern data strategy. Google Cloud Blog. https://cloud.google.com/blog/products/data-analytics/new-whitepaper-explores-three-pillars-of-a-modern-data-strategy/

[2] Michael Armbrust1 , Ali Ghodsi1,2 , Reynold Xin1 , Matei Zaharia1,3 1Databricks, 2UC Berkeley, 3Stanford University. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics

[3] A comprehensive guide to data and AI governance. (n.d.). Databricks. https://www.databricks.com/resources/ebook/data-analytics-and-ai-governance?scid=7018Y000001Fi0wQAC&utm_medium=paid+search&utm_source=google&utm_campaign=15638819267&utm_adgroup=135098872526&utm_content=ebook&utm_offer=data-analytics-and-ai-governance&utm_ad=666067175996&utm_term=data%20governance&gad_source=1&gclid=Cj0KCQjwy4KqBhD0ARIsAEbCt6gzRrArgQRUhG5ziek9z8MUaBPA9nIHw4mktctUIrwtCjZbTQF_KFEaApMIEALw_wcB

[4] Best practices for implementing data mesh on the lakehouse. (n.d.). Databricks. https://www.databricks.com/resources/whitepapers/best-practices-implementing-data-mesh-lakehouse

[5] AI and compute. (n.d.). https://openai.com/research/ai-and-compute

[6] Gupta, A. (2021, December 6). The imperative for sustainable AI systems. The Gradient. https://thegradient.pub/sustainable-ai/

Vinodh Kumar Ganesan

Senior Solution Architect - Data and Analytics

5 个月

Very detailed and insightful

要查看或添加评论,请登录

Rahul Pandey的更多文章

  • Concept: Building PromptLab with MCP and LangGraph

    Concept: Building PromptLab with MCP and LangGraph

    Anthropic's MCP is going to be a foundational standard for connecting AI systems to external tools. It allows the…

  • Concept: Building MLflow MCP Server

    Concept: Building MLflow MCP Server

    MLflow is a powerful ML platform for managing the entire machine learning lifecycle, making each phase traceable and…

    2 条评论
  • Byte-Sized Paper Summary: Week 9, 2025

    Byte-Sized Paper Summary: Week 9, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Choosing the Right Evaluation Metrics for your ML Project

    Choosing the Right Evaluation Metrics for your ML Project

    Introduction In machine learning, choosing the right evaluation metric is crucial for assessing model performance and…

  • Byte-Sized Paper Summary: Week 8, 2025

    Byte-Sized Paper Summary: Week 8, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

    From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

    In my last article, I described how I created a system to accelerate my learning and upgraded my terminal, which helped…

    1 条评论
  • Byte-Sized Paper Summary: Week 7, 2025

    Byte-Sized Paper Summary: Week 7, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • How to Merge LLMs?

    How to Merge LLMs?

    The landscape of open-source LLMs is evolving rapidly, with models now handling trillions of tokens and billions of…

    1 条评论
  • Byte-Sized Paper Summary: Week 4, 2025

    Byte-Sized Paper Summary: Week 4, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Byte-Sized Paper Summary: Week 3, 2025

    Byte-Sized Paper Summary: Week 3, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

社区洞察

其他会员也浏览了