Let's talk about the profitability of your data investments
Let's face it, we don't know how much our investments in data are going to pay off. We probably even might have a significant discrepancy between benefits and costs without knowing. Snowflake's and Databricks' bills are growing inexorably. We usually don't talk much about the compensation of our precious data engineers and data scientists. And on the other hand, it is not easy to assess the business value of data projects, either.
Let's try it nevertheless.
In modern data organization approaches, like data mesh , there is a new way of how we think about data: We think of data as a product. We incorporate product thinking. We no longer use data just for ourselves, but process it in a way that other users (our internal customers) are pleased to access and use our data for their use cases. We do everything to make their life easier: We are happy to share our data. We even clean data for them. We have a documentation that explains the syntax, semantics, provenance, and quality for shared data sets. We even have observability and monitoring to ensure the promised quality and availability.
On a technical and organizational perspective, data is modularized in logical units, called data products (the term data product is quite similar to the principle of data as a product, yet it has a different focus). We have a clear accountability of one team for a data product, with a dedicated product owner.
One of the primary tasks in product management is actually to ensure profitability and a positive return on investment. The business value should be higher than the costs.
Costs
Let's first talk about the costs of building and running a data product.
When looking at the total costs of ownership, we have a number of expenses:
The most important step is to make these costs transparent on a data product level. This is especially true for the direct data platform costs. The measurement and visibility of costs is an incentive for many teams to noticeably reduce these costs via optimizations, such as more efficient data models and cleanup jobs.
Business Value
OK, we know the costs, but how to calculate the business value of a data product?
领英推荐
We love to talk about data-driven businesses and innovative AI projects.
Data becomes valuable only when it is used. Data that is not used is just costs. We can assess the usefulness of data in our own domain quite well, but what about other teams in the company? The most important KPI is the number of users of your data product. A way is track users is to introduce data contracts and data usage agreements with your consumers. It contains the description of the data usage purpose, i.e. the business use case. Together with business or management, the data product owner can define a nominal value what the data usage is worth, such as $5000 per month.
In a more advanced step, data product consumers can also be charged through internal cost accounting for using a data product. The billing is part of a data contract, and a consumer can decide if the cost is worth it for their use case. Data contracts can also be cancelled or renegotiated by both parties, e.g., by the consumer when data quality is not good enough, or by the data product provider, when the expenses are growing too high.
Data Product Management is Product Management
With transparent costs, a data product owner has a clear incentive to attract additional consumers using their data product. This also includes advertising internally and increasing usability, e.g. by improving the documentation. This is a positive feedback loop to enhance the quality of data products.
If the value (usage or revenue) does not cover the costs in the long term, then a data product may also have to be retired to save data platform costs for unused data.
In the end, data product management is just product management: We offer products, and if we find enough consumers who are willing to pay for it, it is profitable. Data products are a good unit to track costs along the business use cases. The challenge for organizations is to make costs transparent.
Tooling
Most cloud providers have tools to break down costs, usually by tags. So every data platform service should have a defined way to mark the associated data product with a tag. This should be a governance global policy . It also helps when each team has a separated cloud account, what makes team-costs very transparent.
Advertisement: At INNOQ, we implemented the Data Mesh Manager , a data management tool, that supports data product cost tracking and data contract management.