Lies. Lies. Damned lies and statistics
When numbers tell you the opposite of the truth (which happens more than you might think)...
Here is a quick illustration of 3 all too common mistakes in data analysis courtesy of my weekend ??♂? problem
…here are 3 classic mistakes to make sure you avoid in one simple illustration
So like all good weekend warriors I spent most of yesterday crawling through my Strava ?? from this Sunday’s Napa Marathon (that and shuffling like an old man and dropping to the floor with unexpected bouts of comedically painful cramp)
Taking a look at the overall field stats gives really important lesson for all of us who use data professionally
The Napa Marathon hosted 3 events: a full marathon (26.2 miles) a ? marathon (13.1m) and a 5k (3.1m). In total across the 3 events there were 4179 finishers.
Let’s take a look at the average field pace for each race:
At first glance that seems counter-intuitive. The longer the race, the faster the runners went.??
Now, if like me you’ve run at all you can probably work this one out. Marathon’s tend to attract a highly fit field; once you enter one you train extensively. By contrast a 5k at an event like this is a casual fun-run.
If I’d concluded that longer distances make you run faster I would of course have been wrong. If I’d gone further and extrapolated that over 50 miles the average runner would travel at 6:46 / mile I’d be foolish (even though that’s what a linear regression of the results would tell me…with an 85% r2 to boot.
So what is going on?
Selection Bias: We have a classic case here. The runners in the marathon are going faster…but not because of the extra distance. The event has attracted a different group of participants. If I try and generalize from these results I get into trouble
领英推荐
Segment Mix: That gets compounded as I have ignored mix effects. Men comprised 58% of the marathon but only 35% for the 5k - and (generally!) men run faster than women. My mix effect is compounding the selection bias here
Over-Fitting: Finally I have over-fitted a correlation from 3 data points; while it is easy to make the stats look compelling…it’s a quick route to some nonsense real world conclusions
So what? What has all this got to do with selling wine???
Whilst the example above might sound silly this is exactly the type of issue with data I’ve seen time and again in my career.
It is especially a risk in product development and design of user funnels and journeys.
To take a real example from Naked Wines we nearly made a major design error due to team members looking at behavior from customers who had opted-in to membership after buying something other than our standard “introductory offers”
A great real world example of selection bias. Beware generalizing from 5-10% of your consumers who avoid the standard “happy path”. Luckily some sharp guys in our data team spotted the issue in time, but we could easily have invested material time and energy in error
How can you arm yourself to avoid these mistakes?
Here are 3 of my top tips to avoid these types of mistakes in your business.
Ps - Obviously what you really want to know. How did the marathon go? Somehow I managed to stay dry and ran a 3hr 34, for a 15 minute PR -> I think it was all down to the cheer squad!??
Founder at Northwest Strategy Associates: Consulting | Strategy | Business Planning | Transformation | Insight | Pricing | Proposition | Customer | Commercial | Marketing
12 个月Good post Nick, it's definitely not just wine! As customer data proliferates all across retail / D2C there is enormous temptation for data-semi-literate managers to assert that "the answer is in the data" and equip huge analytics teams with resources and power to execute substantial strategic changes on this basis. Often the decisions are wrong and it's commonly because they fall short on 2 or even 3 of the simple but powerful rules you have posted. And it's happening again and again, in bigger and better equipped companies than everybody thinks...
Co-Founder at Beer52.com & Wine52.com | B-Corp
12 个月Congrats Nick. Yes, bias is such a huge factor. In smaller teams it's very hard to wind back things once they've been implemented and don't work, especially with confirmation bias in the mix from those who designed it. Why it's important to test fast and cheap first, then build things after the data has been properly analysed. We've all been guilty of it.
Impressive marathon achievement, Nick! Your insights on data pitfalls are a great reminder of the importance of critical analysis in DtC strategies.