Data with out context. Whats the point ?
As a research and insight agency, ZED look at data all day. Buckets and buckets of data, all honestly and meticulously taken by its owners, all saved and cared for as if it was the elixir of life. The magic formula for success and growth.
Obtaining data, in today’s connected world, simply isn’t the problem. We are awash with it, so much so that “data drowning” and “analysis paralysis” are rife in organisations across the globe.
It got me to thinking about our role at ZED and what we are really doing to solve the problem of "Data Deluge."
As Gary V said, "If Data is King, then Context is the God" (well he said Content but Data works just as well). Are ZED helping to navigate the universe?
It became clear to me, that ZED are not just a research and insight agency but also a context agency. We give meaning to data and explore not just the what and when but also the why and how.
We don’t see the numbers, we see the behavioural, systemic, social and economic reasons behind the data. We kind of give the Matrix emotions, and the human element.
I read a great book, Data Points: Visualisation That Means Something by Nathan Yau which really tackles the issue brilliantly and gives a deep understaing of the relationship between data and context.
Yau explains that Data is an abstraction of real life, and real life can be complicated, however, if you gather enough context, you can at least put forth a solid effort to make sense of it.
For instance, if you look up at the night sky, and the stars look like dots on a flat surface. The lack of visual depth makes the translation from sky to paper straightforward, which makes it easier to imagine constellations. Just connect the dots. However, although you perceive stars to be the same distance away from you, they are varying light years away.
If you could fly out beyond the stars, what would the constellations look like?
The initial view places the stars in a global layout, the way you see them. You look at Earth beyond the stars, but as if they were an equal distance away from the planet.
Zoom in, and you can see constellations how you would from the ground, bundled in a sleeping bag in the mountains, staring up at a clear sky.
The perceived view is fun to see, but flip the switch to show actual distance, and it gets interesting. Stars transition, and the easy-to-distinguish constellations are practically unrecognizable. The data looks different from this new angle.
This is what context can do. It can completely change your perspective on a dataset, and it can help you decide what the numbers represent and how to interpret them. After you do know what the data is about, your understanding helps you find the fascinating bits, which leads to worthwhile visualisation.
Without context, data is useless, and any visualisation you create with it will also be useless. Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote second hand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought. Nathan Yau.
You must know the who, what, when, where, why, and how -- the metadata, or the data about the data -- before you can know what the numbers are about.
Who: A quote in a major newspaper carries more weight than one from a celebrity gossip site that has a reputation for stretching the truth? (althought the power of social media influencers can pack a substantial punch these days with one tweet from a Kardashian wiping a $1bn of a Snapchats share price) Similarly, data from a reputable source typically implies better accuracy than a random online poll.
For example, Gallup, which has measured public opinion since the 1930s, is more reliable than say, someone (for example, me) experimenting with a small, one-off Twitter sample late at night during a short period of time. Whereas the former works to create samples representative of a region, there are unknowns with the latter.
Speaking of which, in addition to who collected the data, who the data is about is also important. It’s often not financially feasible to collect data about everyone or everything in a population. Most people don't have time to count and categorize a thousand gumballs, much less a million, so they sample. The key is to sample evenly across the population so that it is representative of the whole. Do data collectors do that?
How: People often skip methodology because it tends to be complex and for a technical audience, but it's worth getting to know the gist of how the data of interest was collected.
If you're the one who collected the data, then you're good to go, but when you grab a dataset online, provided by someone you've never met, how will you know if it's any good? Do you trust it right away, or do you investigate? You don't have to know the exact statistical model behind every dataset, but look out for small samples, high margins of error, and unfit assumptions about the subjects, such as indices or rankings that incorporate spotty or unrelated information.
Sometimes people generate indices to measure the quality of life in countries, and a metric like literacy is used as a factor. However, a country might not have up-to-date information on literacy, so the data gatherer simply uses an estimate from a decade earlier. That's going to cause problems because then the index works only under the assumption that the literacy rate one decade earlier is comparable to the present, which might not be (and probably isn't) the case.
What: Ultimately, you want to know what your data is about, but before you can do that, you should know what surrounds the numbers. Talk to subject experts, read papers, and study accompanying documentation. (talk to ZED)
When learning statistics, you typically learn about analysis methods, such as hypothesis testing, regression, and modelling, in a vacuum, because the goal is to learn the math and concepts. But with real-world data, the goal shifts to information gathering.
You shift from, "What is in the numbers?" to "What does the data represent in the world; does it make sense; and how does this relate to other data?"
A major mistake is to treat every dataset the same and use the same canned methods and tools. Don't do it!
When: Most data is linked to time in some way in that it might be a time series, or it's a snapshot from a specific period. In both cases, you must know when the data was collected. An estimate made decades ago does not equate to one in the present. This seems obvious, but it's a common mistake to take old data and pass it off as new because it's what's available. Things change, people change, and places change, and so naturally, data changes.
Where: Things can change across cities, states, and countries just as they do over time. For example, it's best to avoid global generalisations when the data comes from only a few countries. The same logic applies to digital locations. Data from websites, such as Twitter or Facebook, encapsulates the behaviour of its users and doesn't necessarily translate to the physical world.
Why: Finally, you must know the reason data was collected, mostly as a sanity check for bias. Sometimes data is collected, or even fabricated, to serve an agenda, and you should be wary of these cases. Government and elections might be the first thing that come to mind, but so-called information graphics around the web, filled with keywords and published by sites trying to grab Google juice, have also grown up to be a common culprit.
Learn all you can about your data before anything else, and your analysis and visualisation will be better for it. You can then pass what you know to your team, board and customer, resulting in a more nuanced, tailored and impactful interpretation of a data set, which truly delivers on multiple levels. (Marketing, Commercial, HR)
However, just because you have data doesn't mean you should make a graphic and share it with the world. Context can help you add a dimension -- a layer of information -- to your data graphics, but sometimes it means it's better to hold back because it's the right thing to do.
Yau’s main observation is worth repeating again; In the end, it comes back to what data represents. Data is an abstraction of real life, and real life can be complicated, but if you gather enough context, you can at least put forth a solid effort to make sense of it.
I am reassured, after reading Yau’s book, that as a data context agency, ZED is walking the walk and, when it comes to Gen Z and youth engagement, we don’t just understand the numbers, we truly understand how the numbers go there in the first place.
If you would like to make sense or your data, please get in touch. We would love to talk Gen Z data, context and engagement with you.
The Philippines Recruitment Company - ? HD & LV Mechanic ? Welder ? Metal Fabricator ? Fitter ? CNC Machinist ? Engineers ? Agriculture Worker ? Plant Operator ? Truck Driver ? Driller ? Linesman ? Riggers and Dogging
5 年What a great read Michael, I can't wait to start utilising this information.