Reducing Organizational Data Costs

Reducing Organizational Data Costs

We speak to a number of organizations who are in the process of building and deploying data infrastructure and analytical processes. Organizations face a number of challenges that prevent them from meeting their analytical business objectives. The idea of this note is to share our thoughts on one specific challenge - high cost. Specifically:

  1. Cost model - Deconstruction of cost
  2. Drivers - Drivers of each cost dimension
  3. Recommendations - Actions to address each driver

Simple Cost Scaling Model

From our experience, when we abstract out the cost structure, it looks like the equation below. For each usecase that a business is considering, there is a transaction or threshold cost due to existing infrastructure and process, and a usecase-dependent, effective cost of delivering on the usecase.

The various dimensions of the cost model are:

  1. Base: Threshold transaction cost that has to be paid always due to the way the systems are organized
  2. Process: Task-specific additional coordination cost
  3. Depth: Cost scaling based on the complexity of the modeling and delivering
  4. Confidence: Cost scaling due to degree of trust needed in the output
  5. Reuse: Cost amortization from the ability to reuse assets, process, and outputs

Drivers

We expand on the underlying drivers for each of the cost dimensions:

  1. Usecase Selection: A usecase for which there is limited buy-in and unclear business benefit results in loss of interest over time. Projects are sometimes shutdown midway resulting in loss of time, and waste of effort.
  2. Base Complexity: The complexity of the IT systems increase the transaction cost for every data project at every step. The process of discovering what data exists in the system, how to access it, whether it is relevant and usable could be time consuming. Further, implementing the data project may require workarounds and additional modules.
  3. Process: Each usecase may involve different degree of coordination between people and organizations, and integration between systems. Friction in this activity due to economic or other issues increases the overhead for a given data project.
  4. Depth: Questions in the organizations can be framed and addressed at varying levels of scale, accuracy, and relevance. The cost increases with scale (e.g., every customer instead of a cohort), accuracy (e.g., fundamental drivers instead of proximate causes), and relevance (e.g., integrated into the workflow at the exact time and detail instead of loosely enabling a decision).
  5. Confidence: Analytical process tends to be error prone. Testing the answer for robustness over dimensions such as time, space, and user groups often is several times the cost of initial analysis. This is often due to the fact that systems are often not designed for controlled experimentation. In addition, more infrastructure has to be built to repeat the experiments.
  6. Reuse: Questions in an organization tend to build upon previous questions, and systems are required to create the ability to reuse the process, technology, and data assets created. Mechanisms for managing the artifacts created, and sharing the results are often missing in enterprises. Analysis attempts often start from scratch.

Common across all these drivers is the people cost. As the cost of the technology is dropping over time, the cost is shifting to people, and this cost is growing rapidly with time.

Recommendations

The cost model immediately shows how the cost can be reduced over time:

  1. Find Good Quality Usecases: It is worth spending time to find a good quality usecase to drive the development. Such usecases have clear and positive economics that aligns people (motive), meets data and people preconditions (means), and commitment from the organization with resources (opportunity).
  2. Incentivize IT for Data Consumption: Data engineering can account for upto 80% of the data project. Architecting systems for data discovery and consumption will reduce the transaction cost for all projects. It is not simple though. IT organizations are overburdened, and incentivized for functionality and robustness. They are not incentivized to make technology and other choices that enables the organization’s data journey. There is often technical debt that already built up over time that needs to be addressed first.
  3. Build Well-Oiled Team: Most data projects involves coordination across business functions including IT, business, and data teams for a number of reasons including formulation of the problem, selection of methods, and uncovering tacit knowledge. Ensure low friction and high degree of collaboration.
  4. Build Balanced Team: Data projects have three main areas - modeling/statistics (20-40%), engineering (40-80%), and domain (10-30%). Strong and proportionate representation from each of these areas will enable a defensible result, an efficient implementation, and a business-relevant output.
  5. Sharing Culture: Data analysis generates significant amount of knowledge about an organization’s business including customers, product, and data assets. Providing a mechanism to share work products, and incentivizing the same will save significant amount of resources for the organization by reducing errors and reusing the work done.

Takeaway

The high cost of data projects can be understood and reduced. But discipline in the thinking and execution is required to achieve the same. Data science is more than anything is a test of the character of the organization.

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, a data engineering company. Their flagship product, Enrich, is a robust data enrichment platform.

Satheesh Babu Vattekkat

CTO | SaaS, Platforms, Fintech | Bootstrap to Scale Startups

6 年

Very nicely put. I would have imagined reuse is a given, but you are right - I'm finding that it takes a very good discipline to ensure reuse; while designing for one time tasks are more popular in data sciences, it is keeping up even in regular engineering projects.

要查看或添加评论,请登录

Venkata Pingali的更多文章

  • Robots Need Not Apply: Job Roles in Enterprise

    Robots Need Not Apply: Job Roles in Enterprise

    [AuthenticHumanTM] Job related anxiety is real and growing. 90% of developers in the Harness 2025 State of Software…

    1 条评论
  • Agentizing Business Process

    Agentizing Business Process

    Feel the AI stones to cross the agentic river TL;DR Agentization of business processes has started Understanding…

    2 条评论
  • Agent-Based Systems Have Arrived: AI Engineer Summit Online 2025

    Agent-Based Systems Have Arrived: AI Engineer Summit Online 2025

    TL;DR: The AI Engineer Online Summit 2025 shows that AI agents are rapidly maturing. The talks had a strong sense of…

    10 条评论
  • Where will LLMs be in the Next 12 Months?

    Where will LLMs be in the Next 12 Months?

    Benchmarks. Normally we like to think of technology development as an independent process dictated by markets.

  • Agents Will Take Over IT Service Management

    Agents Will Take Over IT Service Management

    TL;DR ITSM economics is about to breakdown ITSM has a long tail of use cases because of complexity Agents will be…

    1 条评论
  • [Feb 5] Implementation Experiences with Domain LLMs

    [Feb 5] Implementation Experiences with Domain LLMs

    A lot of theoretical work is happening but delivering it to end customers is still a bit of challenge. This week we…

  • Post-Deepseek World

    Post-Deepseek World

    Deepseek has reset priors of the tech community at large, and opened a much larger application game. Here is a mix of…

    4 条评论
  • Jan 24, 2025 - Knowledge Agents & Economics

    Jan 24, 2025 - Knowledge Agents & Economics

    Welcome! In this edition we have two articles written by me and Rajesh on structure of knowledge agents, and economics…

  • Alignment is Critical: What I’ve Learned About Leading a Cross-Border Startup

    Alignment is Critical: What I’ve Learned About Leading a Cross-Border Startup

    Leading a cross-border organization has taught me that success depends on understanding and adapting to unique…

    6 条评论
  • A Year to Remember

    A Year to Remember

    It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was…

    5 条评论

社区洞察

其他会员也浏览了