The most important data?question
Data is an imperfect reflection of the real world. Keep that in mind when using data to solve business problems.
Data can be used to describe the world around us. Typically, the part of the world that is relevant to a problem at hand. Such as, we want to understand?who?a person applying for a loan is to asses their probability to default. Or what songs a person?prefers?to keep them on a music platform. Or?what?is around a self-driving car so not to kill anyone!
In order to leverage analytics to solve problems like these, we need to describe the problems to a computer. And we do that using data — picking the factors we believe are most relevant for each problem.
When assessing the probability to default, we care about a borrower’s credit score and debt-to-income ratio. But not her favourite song. On the other hand, the music preferences are gauged from the previously played songs, not one’s credit score. Whereas, a self-driving car treats people cold-heartedly as obstacles. Not caring about their music taste or credit history at all!
These are just some examples of how data is being used to describe the world. Each of the data descriptions is built with a goal in mind and a problem at hand — because the context matters. Data scientists and whole companies are doing their best to build a?perfect?data representation of the problem they are working on.
The data representation is a critical input for a computer. But it doesn’t matter to whom we want to provide the data with — be it a computer, or a person. Either way, what we need to keep in mind (despite all our best effort) is:
Data is an?imperfect?reflection of the real world.
Why imperfect? Well, in essence, data is a mirror of the world. Now, look at the image above. While the tree is crisp, with all the details sharp and clear, the reflection in the water is blurred, imperfect, simplified. It’s just a reflection in the water!
Similarly, data is a blurred, imperfect and simplified version of the world. Our world is simply too complex to be described by whatever number of features we fit into a data storage. Not even mentioning what the quality of the features we do fit in the data strorage is and how well they?reflect?what they represent.
Should we worry? Well, we could. But it’s not going to change anything. Perhaps, we should rather?embrace this inherent ambiguity of data.
领英推荐
Once we start thinking about data as of a reflection of the world, we realise that what we want to do with a complex real business problem is to capture its?essence?in data. We do that because we can then use technology and analytics to help find a solution. We can play with scenarios and manipulate the problem in ways impossible to do in the real?physicalworld. And if our data reflection is correct, representative, and unbiased, we can then apply the solution we found in the data world back to our real-world problem.
And that’s the crux of it. It doesn’t matter if data is good, or bad. Right, or wrong. Accurate, or not. What matters is:?howrepresentative of the problem is the data? Will the (data-powered or data-informed) solution work in real world?
That’s why we always need to?start with the business problem! We need to make sure we captured the essence of the problem well. We need to think about all the important factors thoroughly. What are the main objects that play a role in our problem? What are their properties and the attributes relevant for the problem? How do the objects relate to and influence one another? Are there any external factors? What data can be used to reflect these? Is the data reliable? Is it representative enough?
And if we don’t have well-representative data, we go get them. No matter if it’s needed for?one important use case, or the success of the whole?data strategy.
The concept of data reflecting reality is incredibly useful for data scientists. But it is arguably even more important for non-data professionals. Let me give two reasons:
We can’t rely on data to automatically give us all the answers. It’s always important to stop and think. Whenever facing a problem, we aim to solve with data, we should be asking:
How?representative?of the real world the data is?
Because we need to make sure we have a simple (not simplistic!), unbiased, and accurate data representation of the real-world problem if we want to apply a data solution back to the real world. And have a reasonable chance it will work well.
Value-driven Data & AI Strategy | Data & AI Products Management | AI Governance
3 年Great post (once again) Adam Votava ! Fully in sync with your point of view, and well articulated. This should also help with addressing data quality the right way: not as some kind of intrinsic perfection, but also as something to be assessed and managed in the context of the business problem in which a given data is used.
Data & Analytics Strategist | Director, Data & Analytics | Advisory Board Member, Emory Quantitive Theory Department
3 年I LOVE THIS METAPHOR! It truly sums up the notion that is a the best model we're going to get of the world -- it's never going to be perfect, only getting better ad infinitum.
A data literate commercial leader and consultant. Ex Yahoo! and Associated Newspapers
3 年Melanie (or Mel) Ross - are you hearing an echo?
A data literate commercial leader and consultant. Ex Yahoo! and Associated Newspapers
3 年I simply love that you've written this piece Adam Votava and I really couldn't agree more. Well said.
optimalizace zásob | AI forecasting prodej? | automatizace objednávek
3 年Hi Adam, good article with clear and relevant point. Nevertheless I feel slight inconsistency here: "?It doesn’t matter if data is good, or bad. Right, or wrong. Accurate, or not." AND "...unbiased, and accurate data representation". Overall the whole idea is absolutely right, but having good, right & high quality data is essencial for any business usage. Otherwise all efforts to utilize them ends up with flushing budgets into wather closet.