How to avoid a Hippo, urban myths and bad questions with Data Science
On the M62, on a bleak hillside between Huddersfield and Rochdale, lies an isolated farmhouse. There are some days where the mist is so dense you’d be hard pressed to pick out its stone frame from the swirling hill fog.
Stott Hall farm is one of West Yorkshire’s most familiar landmarks. It stands in the middle of a transport juggernaut not because of stubborn owners but because of clever design, optimised for the local conditions.
An urban myth suggests that the owners refused to sell their home to the advancing road builders, forcing the M62 to split in two close to the highest point on the UK motorway network. The truth is somewhat different, as this BBC article explains. Faced with landslips in inhospitable terrain, the builders chose to intentionally split the carriageways, saving the 18th century farmhouse and creating the distinctive vista we see today.
It’s all too easy in a world full of communication, where the facts become disconnected from the narrative, for stories to change and evolve.
As Data Scientists, we are the arbiters of fact. It is our job to uncover the facts and fight for a narrative that remains true to that fact, whilst also telling a compelling story. As we navigate the commercial world, uncovering patterns that help businesses understand their past, their present and their future, we must protect ourselves against creating urban myths within our businesses.
The stories we tell and the case studies we write must be grounded in truth and based on scientific principles. We don’t always need to communicate these to the wider audience but, as Data Science professionals, we must work together to lay foundations that are immovable against the fog of urban myth.
How to avoid a hippo?
In my keynote to the Chief Data Scientist Forum, Europe, I was asked to discuss the “HiPPO” – the Highest Paid Person’s Opinion, for it is the HiPPO we must study carefully if we are to create a lasting legacy based on fact, not fiction and avoid creating these urban myths in our businesses.
When I agreed to do the talk back in April, I wrote some scribbled notes that sketched out what I wanted to say. Four months, and bucket loads of data later, I picked up my rough notes and read “talk about how to avoid a hippo”. The “avoid” was even underlined. I’d helpfully drawn myself a picture of a Hippopotamus amphibious to help reinforce my point, drawing myself to the animal rather than the sometimes mythical boardroom figure.
How do you go about tackling such a question? It seems like such a trivial question to answer. “You avoid a hippo by just living a normal life” I thought, desperately trying to remember the context in which I’d set myself the question. How can I possibly talk for twenty minutes about this, other than to recount a humorous anecdote that may border more on urban myth than it did in fact.
To make progress here, I need to focus on “how to find a hippo” and then use my learnings to understand how to avoid one.
Mathematicians call this the “dual problem” – where we consider the problem from a different perspective and use what we’ve learnt to help us see our problem from our original viewpoint.
I spent almost an entire Sunday afternoon pondering the conditional probability of finding a hippo and then not finding a hippo. I drew diagrams, created equations and tried to craft a compelling visualisation. After four hours of effort, I stepped away from the problem to refocus on the bigger picture.
If was to mark my own work, I’d certainly give myself a few method marks for effort. But, in pursuit of my goal, I’d become lost in the detail – heading towards an increasingly complex game of searching for animals in space. I’d left the boardroom caricature behind a long time ago.
The quality and efficacy of my work was not in question; but the relevance and context was.
We must strike a balance between creativity and innovation whilst maintaining a clear focus on our end goal. The key to this is to remain focused on asking the right questions at the right time. As leaders of data science, we must help our clients and stakeholders ask us questions in language we understand.
As a mathematician working in a marketing agency, one of the most compelling things I’ve been taught is that scientists use language differently than marketeers and creatives. There is skill and definite tradecraft in being able to communicate across the domains in a way that both teams can understand.
"Data Science ready" questions are hard to come by.
When Sky Sports came to us last year we spent time working with their teams to help pose the right question to our data scientists. The Sky team didn’t turn up to the first meeting with a fully formed, “data science ready” question.
The essence of their brief was to help them to build a socially engaging campaign to reinforce the Sky and Premier League partnership in the mind of football fans. It took us a month or so to convert the marketing brief into a set of questions that we could answer by using data science.
We did this by understanding the behaviour of football fans and clustering the behaviour by club and topic. We used 70 million tweets in our behavioural study, to help us focus in on how 6.4 million UK football fans behaved.
Using this knowledge we understood how fans would engage with Sky to create an engaging digital narrative in a way which added value for Sky, football fans and the Premier League. You can read more about our award winning work, which put insight at the heart of a Thierry Henry advert, here. In total, the campaign drove 18 million facebook views and was branded the “greatest thing in the history of the Universe” by the Daily Mail.
The very core of data science is hard; to find unknown unknowns in a way which upholds scientific principles in a way that is valuable to the organisations that seek our help. However, as leaders in the field, we must focus our thoughts to help us ask the right questions to create compelling narratives that avoid the need for our leadership colleagues to create their own urban myths.
Urban myths propagate because the story they tell is so compelling; our job as data scientists is to make the true story so compelling with our "data science facts" that it needs no more embellishment.
This article was written to support my talk at the Chief Data Scientist forum conference, held in London on 13th and 14th September 2016.