Data Principles for your organization
The Business Map, self-created with Miro

Data Principles for your organization

A few years back I created the first data strategy for my company back then, addressing the two major topics of Data Quality and how to design an organization that maximizes the value of its data assets - or is it just one? Here is a list of the main ideas, thoughts, and principles that went into this strategy.

If you see value in these ideas, you will find inspiration in posts, arcticles, or books written by Google’s former Chief Decision Scientist Cassie Kozyrkov , Mr. Data Contract Chad Sanderson , and the Data Doc Tom Redman .


Here is the full list in its original ordering.


  1. Data Quality limits Decision Quality
  2. Data is a Product
  3. Data Quality is Product Quality
  4. Data should accurately reflect the real world and be usable for various use cases
  5. Standardization enables Automation enables Scalability
  6. Decomposition of Data Integration and Data Processing
  7. Infrastructure as Code
  8. Resilient, self-healing, and self-improving processes (antifragility, learning system)
  9. Data Lifecycle Management and regular clean-ups
  10. Compliance as integral part of your brand
  11. Horizontal communication and collaboration along the Data Value Chain


Let’s dive into the top 3 for now.

Data Quality limits Decision Quality

This touches the Why of Data Quality. In essence, you want to use existing data points as part of decision processes, whether it is internally, or externally, supporting your users and customers. Additionally, you often want to use the same data points, collected at a later point in time, to measure the success of the decisions made, or to evaluate the options via some form of testing. So, Data Quality is important, both when you are looking for indications, or for proofs.


Ok, understood, but this sounds fuzzy, high-level, bla-bla. Let’s make it more concrete. Your decisions are – or should – be based on assumptions, ideally a causal understanding of what is going to happen if you decide for A versus B. If A, then X, if B then Y. While data is relevant when measuring outcomes or result KPIs like Conversion Rates, or Revenue-related metrics, it’s often more useful to observe the changes in the real (digital) world that led to these outcomes. Good decisions require you to have a useful model of the world, good data allows you to derive such a model. Please note the word “useful” as opposed to words like “correct” or “right”. All models are wrong, but some are useful (George Box, 1976).

You might also like the LinkedIn course on Decision Intelligence by Cassie Kozyrkov .

Data is a Product

Data is not a fact about the world, or your users and customers. Data is produced by technical systems, and it provides at best an approximation to a fact, and at worst it just provides information about the system, and not the entity of interest or its relevant properties. Anyhow, the message behind this statement is that you should focus on the systems that produce your data and ensure that they do it in a way that supports your business needs. In some sense, this is a variation (or a complement?) of the “If all you have is a hammer” theme: Your systems are designed and optimized for a specific data output, not for supporting your specific business processes. Make sure this output is what you need.

Data Quality is Product Quality

If Data is a Product, then this seems to be kind of obvious. Yet it took quite a few years until the idea of Data Contracts became widespread (Thanks Chad Sanderson !). Data Testing came up a bit earlier, supported by tools like Great Expectations and others.

Let’s take a bit of a closer look still. While there are many specific dimensions of Data Quality, I would like to highlight the two relevant aspects of it here – Accuracy and Usability. We want data to represent a real-world object somewhat accurately, some relevant entity, your users, your balance sheet, you name it. At the same time, just having an accurate representation does not help us making decisions, so we also need the aspect of actionability, or usability.

Most frameworks focus on “accuracy”. Which is already an improvement over the big data age where it was only about “accessibility”, i.e., what data is accessible to you, and how can you make this data accessible to others, calling this a “product”. My personal opinion is that Data Contracts are a nice way of closing the gap here, allowing us to define usability. It might not yet be the focal point of Data Contracts, but I see the potential.


So, that’s it for today. Please share your thoughts, ideas, opinions – take care, best, Stefan

Eric Sobolewski

Data & Technology @ TLGG Group "I would rather have questions that can't be answered than answers that can't be questioned." - Richard Feynman

1 年

Don’t know what I did to deserve this mention, but thank you anyways ;-)

Delighted to be an inspiration Stefan Kühn. Keep pushing on all things data, especially quality which is not getting the attention it deserves, as hard as you can!

要查看或添加评论,请登录

Stefan Kühn的更多文章

  • Inventor Mode versus Investor Mode

    Inventor Mode versus Investor Mode

    You know, it's time to speak about the (in-)famous founder mode. Why infamous, you might ask.

    3 条评论
  • What we can possibly learn from professional sports teams

    What we can possibly learn from professional sports teams

    I have the privilege of working in two very different environments, both highly professional and ?competitive“. My main…

    4 条评论
  • How to find Product-Market Fit

    How to find Product-Market Fit

    Finding and establishing Product-market fit is one of the biggest challenges for early-stage products and companies…

    2 条评论

社区洞察

其他会员也浏览了