Data, Visualizing it, and getting down to 200lbs while we're at it!
So, I thought I'd take a little break from something else I've been working on and write a little post on data. #COVID has brought us a lot closer to charts, graphs, flattening curves, trend lines, and all other sorts of data visualization jargon. But, this also means we are seeing data presented in radically different ways to tell totally different stories, and it cracks my up when I see someone confused as to how I could possibly disagree when such-and-such news site had so-and-so chart to prove their point. So, I thought I'd take a page out of my life and talk about data a bit, and just how easy it is to make that data dance!
Cause, why not? Data so sexy!
What is that you say? Well, it's my bottom line. Literally, and figuratively... if ya know what I mean. ;-P All kidding aside, this is a plot of my weight loss in 2020. Yes the text is tiny. If you really care what I weighed in February you can open the bloody image and zoom in. The point isn't the values, the point is... who's losing weight like a boss? Me! Ok, that's not the point either. LOL. I am proud of this though, but, this isn't a celebratory article. What I want to talk about is information manipulation. What? Yes:
INFORMATION MANIPULATION
It's a real thing. So why are we looking at my weight plot? Because this is a lovely example of using truthful data to lie through my teeth, sort of! Before we find out how I'm lying exactly (or am I?), we must address the elephant in the room...
Why would you lie about data?
Ummm. Because you have an agenda. In this case, my agenda is pretty clearly shown in the lovely blue line in the graph above. I want to lose weight! A lot of it! I peaked at just under 280 lbs in 2019 at the end of the international conference season, and that is NOT a healthy or attractive weight on my 6'-2" frame. No sir-ee bob! So, knowing 2020 was going to be an excellent year to lose weight (LOL) I started trying to lose weight by applying years of experience of failing miserably at it! And it was working! I dropped 25 lbs in two months, and entered 2020 at 255 on a downward trend; and clearly I was successful every day up to now! End of article, nothing to see here... wait, no, don't copy that other chart in...NO!!!!!!
Well, fudge. Cat's out of the bag. There were some good times, and there were some bad times... WTF happened in October??? And January? And... well, all over the place? But before we talk about specifics, let's cut to the chase and talk about the power of this little lie I knowingly tell myself. Why does my fitbit (yes, I'm lying to my fitbit) think I've only lost weight over the year? Well, because while I write down on paper all my weight measurements, I only record a weight in my fitbit app if it is lower than the previous measurement.
Wait, what? Why?
I do this because I need encouragement to lose weight. I like to eat, and drink, and eat more, and drink more, and... you get the picture. I'm also a stress eater (cookie dough ice cream + wine make everything better) So, I need to "feel good" about my progress and seeing an orange line going upwards does not make me "feel good" - which makes me want to eat more, not less.
So, I have worked out this sweet arrangement with my fitbit where I only record data that is positive (negative mathematically, but positive emotionally... signs get you if you're not careful!). When I get hungry, I look at my lovely blue plot of weight loss and hopefully only eat some of the food on my plate. Line continues to go down. Kelly happy. Food in belly. Data mad. End of story. (or is it?)
Charts always tell a story
I think people like to believe that charts and graphs (the same thing basically) are somehow these "unbias-able" snapshots of raw facts that can't be anything but true. I don't understand why people like to believe this, but I see evidence of it in conversation after conversation. In reality, charts are fundamentally visual presentations of information gleaned from selected data points. Here, let me re-write that with some emphasis:
Charts are VISUAL PRESENTATIONS of INFORMATION gleaned from SELECTED DATA points.
Before the "raw data" ever sees the light of day in a chart it gets cleaned, filtered, normalized, and all sorts of other things. Then, once you've selected what data you're actually going to chart, you make adjustments to all sorts of visual parameters so that the things you care about (as the maker of the chart!) are highlighted so that you can present that information clearly, concisely, and convincingly. What I just described involves a whole lot of personal judgement and therefore introduction of bias into every single chart ever made, EVER. So, let's drop the myth that charts are unbiased please! Now, let's take a look at my charts as a specific example...
What story does my chart tell?
So, let's talk about what and how these two charts differ from each other. And for that, I want you to first look at the blue chart above and then "tell yourself the story" of what that chart shows you. Here it is again:
What do you see in this chart? To me, it looks like someone training methodically, chipping away at their weight gain month by month making steady progress throughout the year. I look like a freaking pro at it. LOL. Now, look at the second one:
Does this tell you a different story? It does to me. Of course, I know the WHOLE story. So, perhaps I'm biased here. Regardless, to me, this looks like someone who's has a couple of successful months (Feb - March, June) but who's otherwise struggled keeping the weight off and fell right off the wagon in October. Oops!
What's actually different?
So, bear in mind not a single bit of the "blue line" data is made up. It's all true. It's just selectively sampled. There is no "lie" in the data, just "errors of omission" so to speak. (My mother is glaring at me in my mind right now telling me omission IS a lie.) I told you my "omission algorithm" earlier - only record a weight if it is less than the previous measurement. You can clearly see the effect of that algorithm in the story told by the two charts. So, the good news is that my algorithm was highly successful in manipulating the story as intended. The bad news is that I only had to change one single thing to make two very different stories. Now, let's have some more fun...
What story does this chart tell you? A lot less impressive looking, but the fluctuations look smaller. October no longer looks like someone fell off a wagon into a puddle of lard. What do you think I changed here? Well, "nothing" really. No data selection or manipulation at all. All I did was expand the vertical axis to run from 0lbs to 300lbs instead of just 200 - 270. The visual story is radically different, but the data is the same. Fun!
Now, what about this one:
What does this chart tell you about my weight loss journey? To me, this looks way more realistic than the "blue line" but less, shall we say, self-destructive than the "real data". But, something looks off. What happened to January and how exactly have I recorded through the end of the year??? Well, I used the magic of a moving average! What, you ask, is that? Well, in this case, I used a trailing 21 day average. So, the graph isn't showing the real data. Instead, on Feb 1st it is showing the average value of that day and the prior 20 days. So, all those little spikes up between low points, smoothed out. All those abrupt drops to recover from the spikes, smoothed out. October still sucks. This is "Data Manipulation" at its best.
So, let's re-hash some of the simple ways of manipulating the "story" that data tells us without ever doing something unethical like "cooking the books".
- Data Selection - This was my original sin. Using a very simple selection algorithm I have managed to make my journey look consistent, competent, and darn near robotic in nature!
- Data Manipulation - In my last modification, I took my real data and made it look more like a selected data by using a trailing average to smooth out the fluctuations in the daily measurements.
- Chart Axes Manipulation - Simply by plotting the data on a smaller vertical scale I easily exaggerated the weight loss to look really significant over the course of the year while, relative to my total weight, the changes are far less dramatic.
And it was all completely honest...
That was simple data, something we can all understand and relate to. Yet, by making extremely minor changes in how that information was presented, I can tell some wildly different stories about my weight loss journey. Do I want it to be a motivating success story to post on Instagram to make me look awesome? Do I want it to be a heart-felt "it's hard to lose weight" story to post on Facebook to rally my friends to help me hit my goal? I can do it all!
Muah hahahahahahahahaha...
So... WTF happened in October?
So, all this made me personally wonder what was happening during these times when I was doing well, and when I wasn't. For those of you who know me, but don't know me, 2020 has been one hum-dinger of a year. My wife and I separated in 2019 and that meant 2020 was the year of selling our house, buying my house, negotiating a fair divorce, and the likely passing of one of our dogs... and that was all before I knew there was going to be this whole "pandemic" thing. So, I thought I'd overlay what was going on in my life on top of this chart of how I was (or wasn't) losing weight. The general things that might be triggering my stress eating habit should be pretty clear from the description above, but there were some events that had big impacts on my progress...
- Good news - exercise actually works apparently. Who'd have thunk it? And I need new clothes. None of my pants fit anymore. It is making winter a little #awkward to be totally honest, but in a good way. Right?
- Bad news - when exercise leads to getting poison ivy, which for me always leads to steroids... watch out weight goals, steroids make you hungry and bulk up. So, that's what happened in October.
- Obvious news - the only thing that makes you gain weight faster than steroids is when you mix your Thanksgiving holiday (yum) with the finalization of your divorce paperwork (not yum).
- Silver Lining news - while getting COVID (Yikes!) is still a mystery to me, and having Covid SUCKS, like any flu or other sickness that knocks your GI system out of whack... man did I lose some weight! Woo hoo!
So, I'm back on track with my 2020 weight goal of getting to 200lbs! Thanks #COVID!
The REAL Bottom Line:
Now, let's get out of my pants and into the real world for a second. What does this mean about the broader discussion about data visualization and the various manipulations that we see on the screen every day? Well, what it means is that you should look at charts and graphs even more carefully than you look at the written or photographic information in the news, on government websites, whatever. Not because there might be a conspiracy or something silly like that. Just, because, people make choices when they make charts, and those choices impart bias. I'll give a #COVID related example:
Did you know that most of the charts you see with Covid infections are on a logarithmic vertical scale? If you know what that means, great. If you don't, that scale is useful for how it gradates changes at small and large scales. In other words, it makes small changes in small numbers noticeable while still being able to show large numbers on the same chart. However, it does that at the expense of showing small changes of large numbers (duh) but also at the expense of showing increases over time in a simple and intuitive way. Kind of a big caveat. Let me demonstrate:
The left is a linear distribution of the data vertically, and the right is logarithmic. The lines represent various rates of increasing numbers. In the linear distribution, you can't see ANY difference between a logarithmic increase and a linear increase. Even doubling is indistinguishable until the 14th cycle! Meanwhile, in the log plot we can see lots of detail down in the lower numbers, but the HUGE difference between doubling, tripling, and quadrupling at the higher numbers gets lost in the wash. Quadrupling looks only marginally worse than tripling, even though (get ready for this), at the llth iteration that's the difference between...
...59,049 and 1,048,576 people having covid
Think about that for a second. Go look at those #COVID charts like the one below. Look at the scale on the left. The gap between 100 and 200 is the same vertical size as the gap between 1000 and 2000. So, this is clearly a logarithmic vertical distribution. Not, extrapolate out what we saw above and apply it to this chart...
In this chart it doesn't look like there is a huge difference between Spain, Italy, or China. But, remember, in the upper quadrants of this chart even a small separation between lines is HUGE in terms of the numerical impact. Meanwhile look at the bottom where the US is trending away from Japan and South Korea but still pretty close. This was early days baby...
This one is a little further along, and here we can see a dramatic change in fate for our country's handling of the pandemic. That aside, again remember that differences at the top of the chart are shrunken while differences at the bottom are exaggerated. If you plotted this linearly, then Japan and Singapore and even South Korea would be indistinguishable from each other and the X axis line. Meanwhile, the US would look MUCH MUCH MUCH worse than it does to the naked eye in this plot. Something to consider!
If someone has a picture to data tool handy, I'd love to have someone screen-shot this and plot the same data linearly. It would be fascinating to see with real data. :-) Promise I'll edit the article and give credit!
#Conspiracy (not really)
In the political spectrum I'm a centrist, a moderate. I'm extremely skeptical of anything anyone with an agenda (everyone) shows me or tells me when it comes to data in a graph. The examples above are a PERFECT show-case of why you should be too. So, is this a "bad" thing? Are people cackling away with their evil laughs while they make charts with the specific intent to mislead and confuse us? Is this all a massive conspiracy theory?
Ummm, No. Sorry.
Like it or not, no one sees themselves as a villain, as the "bad guy". People do this because graphs are meant to communicate a story, and depending on the information the maker of the chart wants to convey you'll get different charts of the same data. It's just how things work. People are people. They care about what they care about. Someone who wants everyone to wear a mask is going to present the data differently than someone who wants to justify not wearing one. That's just how it works. They're not trying to mislead you, they're trying to lead you to what they think is important to know. In many cases, that is their JOB.
Complaining about charts presenting "their" story to you is like complaining about a book presenting "their" story to you. It's a book. It's supposed to be a story. The same is true of a chart. NOT expecting it to be a story is the only mistake.
So, what we need to do as consumers of graphs and the information they contain (both revealed and concealed) is to be aware of the choices the graph maker has made and how those might influence the story the graph is telling and be willing to put a little time into looking past the presentation to understand not only what the presenter wants us to see, but also what their choices might be understating or even concealing as a result. They're doing their job. We need to do ours! And remember, people who write articles fall victim to things like Chart Envy too, as is beautifully illustrated by this artful meme...
Senior Technologist, Senior Systems Engineer, Lockheed Martin Space
4 年To fun Kelly... One the most enjoyable classes I took in college (back when computers ran on FORTRAN and the geeks were carrying around programs on stacks of punch cards) was on effective data visualization... wish all my classes had been that valuable.?
Strategic Product Consultant at Procore Technologies
4 年Love it Kelly Cone! Keep crushing it. And the data can presented in however you want to present it. That’s the tricky part. Someone informed me a long time ago (who is awesome at this data stuff) ‘it’s not all about the data but the questions you ask’ Keep up the hard work!