Making Sense of Data: The Ten Lenses
Anurag Harsh
Founder & CEO: Creating Dental Excellence, Marvel Smiles and AlignPerfect Groups
Data gets manipulated, misutilized, misquoted and misrepresented in ways that make us believe something that may not be factually correct. We need to understand the lenses through which we can make better sense of the data so we become smarter consumers and citizens
Lens 1: Quest for What is Missing?
Psychologist Solomon Asch conducted a series of experiments in the 1950s suggesting that popular thought rules even when obviously incorrect. The?Asch conformity experiments?or the?Asch paradigm?studied if and how individuals yielded to or defied a majority group and the effect of such influences on beliefs and opinions. The methodology remains in use by many researchers even today.
Asch theorized that humans always felt the pressure to conform to a group, and in practice we might even believe this to be true based on our personal experiences. However, if one looks deeper into Asch's methodology he had overwhelmingly used male American college students as his subjects that are obviously not representative of the human population.
The point I am making is to read the fine print
Data is never neutral. It's a function of the people collecting it. Always inquire what or who is missing from the Data.
Lens 2: The Pursuit for Clarity
Did you know about the epic failure of Google Flu Trends? GFT was a?Google operated web service?that provided estimates of?flu activity for more than 25 countries. By aggregating?Google search queries, it attempted to make accurate predictions about flu activity. This project was first launched in 2008 to help predict flu outbreaks ahead of time. GFT was a massive failure and Google stopped publishing estimates in 2015. The idea behind Google Flu Trends was that, by monitoring millions of users’ health tracking behaviors online, the large number of Google search queries gathered could be analyzed to reveal if there is the presence of flu-like illness in a population.
GFT failed - it was unable to predict the summer flu outbreak of 2009 because it only looked at statistical patterns and did not take into account real world causality of events unrelated to the flu. The algorithm was quite vulnerable to overfitting to seasonal terms unrelated to the flu, like “high school basketball.” With millions of search terms being fit to the CDC’s data, there were bound to be searches that were strongly correlated by pure chance, and these terms were unlikely to be driven by actual flu cases or predictive of future trends. Google also did not take into account changes in search behavior over time.?
In my quarter+ century of working with data and advanced analytics, issues such as the above occasionally do surface with computer generated algorithms although over time things have become much more sophisticated now than they were ten years ago. Algorithms and predictive analytics
Lens 3: The Liability of Emotion
Know that how you feel about a subject based on your upbringing, environment, reading or worldview, can often supersede what the data might tell you. The more visceral or extreme your emotions, the harder it will be for you to believe the data and the more you'll try to find an alternate truth to soothe your emotions. The lesson here is to be aware that we are human and that we often cloud our decisions and beliefs
Lens 4: The Cloud of Experience
It often happens that the insights you get from data are different from or even contrary to the real world experiences you might have had. Let me give you an example. A friend takes the New York Subway into work every day and regularly complains how over-crowded the subway is. However, if you look carefully every subway car has a max capacity engraved on the outside:
Per the Port Authority of New York, the crush capacity of a subway train is more than 1000 people with average occupancy no greater than 130. However my friend always tells me it feels like the subway is at crush capacity all the time. In this instance it's an emotional response to his personal experience that has no bearing with the reality of the statistic. However this does not mean we discard the lens of experience altogether. Depending on the time of day some trains might not run with many riders but some are always at capacity especially if running during peak business hours of morning and evening. The off peak trains that run relatively empty obviously bring the average number of riders down. So if one were to measure let's say from the point of view of the average rider instead of the average subway train, one would arrive at a number much closer to my friend's or any other rider's personal experience.
Lens 5: The Need to Count Ahead
Covid-19 death stats in the early months of 2020 were mainly inaccurate at least in the US and most of western Europe because the numbers were not reflective of home deaths and only counted deaths taking place in hospitals. Besides, many hospitals were over stretched, bursting at the seams and would only report numbers every 72 hours not taking into account the exponentially growing death numbers within this 72 hour period and portraying thus a picture better than the reality at the time.
So it is vital to understand what a data or insight is representative of and how the claim was calculated. For example if looking at the number of doctors in a hospital, would you count two part time doctors as one or two? The learning here is to try to find out how an insight was measured before believing it blindly. Do not prematurely count and arrive at what could very well be a false conclusion.
Lens 6: The Perspective from Two Steps Back
In the Fall of 2019 I was in London and picked up a local tabloid newspaper at Paddington Station as soon as I got off the Heathrow Express. The headline was attractive yet intriguing to me - it claimed the murder rate of London was higher than that of New York for the first time. I had flown in from JFK and as a native New Yorker found this new to be a bit unsettling. How is that even possible I wondered.
领英推荐
I started reading further and learnt in August of 2019 there were 15 murders reported in London but only 14 in New York. Neither of the stats revealed anything about the risk of getting murdered. If one were to take two steps back and do some web searches, one would learn rather quickly that the murders in New York were almost 10x that of London in the early 90s but were gradually declining in both cities with the decline in New York faster than that of London. New York as I was reading the tabloid still had a far greater number of murders on an annual basis than London did, however, the monthly numbers obviously fluctuated. So by looking at the numbers over a wider period of time I quickly learnt that the headline was in essence a click bait - or as they call it now - fake news. The reality was that London was still safer than New York and in fact safer than it had ever been before.
Lens 7: The Search for the Background Story
I remember while at Wharton, my marketing professor talked about this experiment where two groups of supermarket shoppers were offered two sets of jams. One group was offered a choice of six varieties while the other group had to choose from over 24 different brands of jam. The professor revealed the group faced with fewer choices made more purchases than the one dealing with 24 jars of jam. I learnt in that class that less is more.
However, I stepped out of the Huntsman building into the local Starbucks across to find an endless variety of coffees to chose from. I wondered if the guys at Starbucks had ever sat through one of these marketing classes. With time and experience I learnt the hard way that less does not always imply a greater chance of sale and that the story I had learnt at school was just another biased publisher's point of view, a publisher that had decided that story to be the one to make the case and not another story. The less choice if more concept is littered across academic marketing books, despite the fact that decades of real life observations have suggested otherwise or shown at least the opposite point of view to be true.
So when you see such a claim, any claim really, do check the background before accepting it at face value. It could be just another biased media, publisher or author's point of view. Question everything and research
Lens 8: Question the Common Statistical Standard
Every country and research institute has its own methodology and process for producing results based on data. Even the three leading publishers within the US for college rankings - Niche, US News and WSJ all use remarkably different benchmarks and methodologies for publishing their rankings. There is no one standard. Then there is corruption. Some Governments for instance put pressure on their agencies to publish sanitized reports based on correct data. Argentina for example has routinely addressed its inflation problems by making "minor adjustments" to its inflation related data such as rounding down the decimals in monthly inflation numbers (e.g. 1% instead of 1.90%) thus fundamentally affecting the annual numbers (12.7% instead of 25.3%).
Always question how the data is being collected and reported before believing a report.
Lens 9: Beautiful Charts Based on Fake News
Nowadays with the advent of sophisticated graphical tools such as Tableau, Qlik and others, any entity such as agencies, activists, publishers, media companies or Governments can release remarkably beautiful and elegant charts or infographics that tend to have a lasting impact on the mass population. Often the underlying data supporting the image is either incorrect or in question. Then there is clever photoshopping that for example can focus one's attention on a series of "facts" shown in bold type font and keep one away from parts of the chart that may have been shown in lighter colors so as to camouflage them. This is more common than we realize. Humans tend to focus on what's glaring at them on a chart and surgically ignore the rest, even if the data is wrong.
As an example I present to you below a popular YouTube video Debtris (a play on the game of Tetris) that shows falling bricks inscribed with statements such as US credit card debt or total cost of Iraq war etc. If you watch the video for a minute its entertaining to say the least, however, reckless due to the fact that it compares apples with oranges in that it shows both stocks and flows which is analogous to comparing the annual rent of a house with its total cost of purchase.
Lens 10: Expand Your Mind
Our brains are programmed to automatically fill in the gaps in our knowledge or perception rather quickly in order to make sense of things. Our brain is able to construct an incredibly complex jigsaw puzzle using any pieces it can get access to. These are provided by the context in which we see them, our memories and our other senses. This is why we can continue on even if the Zoom call is choppy or the phone line is broken.
However, our brain is also programmed to avoid discomfort or to borrow a word from Mr. Al Gore, uncomfortable truth. We all have our worldview that is further fueled by the bubble we live in, supported by the social and online media sites we live on, where we do not see anything other than what we are shown based on what we like. Remember that a case can be made to prove any theory using partial data in support of a point of view that you may have that you like, but that may not be true or rooted in actual facts. It is therefore important to explore and expand one's mind and reach out beyond one's own bubble across the aisles because there are several other points of view on the other sides (it's not just one other side but several) equally well supported by data with beautiful charts and celebrities propping them up.
In Sum
Be aware of the context, of your own emotions and limitations and try if possible to view the story through the lenses discussed above to see if you can frame a sharper and more accurate picture from the data and charts that have been shown to you.
Be curious. Expand your mind. Burst your bubble.
Assistant at Catholic Charities USA
1 年Ok