登录查看更多内容

How to Think Differently

Harry Powell

Data science leader with track record of innovation and value creation

发布日期: 2022年1月22日

Last year, I wrote a series of micro-blogs about how to think differently about data science and analytics questions. My point is that, when you find a new way to approach a problem, you can sometimes get orders of magnitude improvements in your results, something you are unlikely to achieve by trying out different methodologies or tuning hyperparameters.

“Can you put them all into one easy-to-read blog?” is a question that literally no one asked, but I have done it anyway. Here you go.

1:? Using existing data sources in unusual ways

Sometimes you can learn a lot from data that isn’t meant to have anything to do with the problem space you are working in. In fact we have made good use of data that is just plain wrong.

This is because while the function that records the data may care about the numbers themselves, you might only need to care about a pattern in the data, or about how it co-varies with some other factor. The fact that it is inaccurate for its original purpose may not matter.

For example, we wanted to show how delaying raising issues during the new vehicle engineering programme was causing quality and production issues. To do this we had to normalise the issues data by how much work was being put into the programme. But the labour data was notoriously bad - most people didn’t bother to fill in their timesheets. We were told that there was no point in even looking at it.

It turns out that while the labour data was completely wrong, it was wrong in an independent and unbiased way. So it was fine to use for our purpose, and we were able to show that engineers were raising issues too late and that this drove quality problems.

2: Working back to front

Sometimes a problem can only be solved by working back-to-front, starting at the end and going back to the beginning.

When we wanted to calculate the daily sales in each country, we found that, during the month, each market used different rules to determine when a “sale” is recorded (although they did reconcile at the end of the month). There was no easy way to contact the markets to ask them about their logic. So we had to start with the historic output data and infer the sales logic from that. We were then able to apply that logic to the intra-month vehicle sales information and calculate daily sales easily for the first time.

Even if it’s not 100% right, it gets you to within a couple of % and that is a lot better than trading blind. It certainly helped us when lockdown hit and we needed to reduce inventory fast.

3: Supersimplification

It is always tempting to think that complex problems can best be solved by complicated solutions, but you often find that supersimplification gets you even better results. By this, I don’t just mean building parsimonious models by eliminating insignificant variables (you should always do this). I mean radically changing the approach, answering the question in a completely different, perhaps even naive way.

We needed to build a model to select a set of cars to build and sell. Now there is a very complex logic which determines what cars can be built, and a similarly involved logic of what cars will sell in what markets. People had tried to code it up a number of times, and tied themselves in knots. Then one of the team suggested just sampling from the vehicles that we had built in the last 2 years. It is a shockingly simple approach and although it doesn’t sample the full distribution, it gets you 99% of the way there in 1% of the time. And then you can use the time you save overcomplicating something else! :)

4: Abstract problem solving

You can create a lot of value by thinking about your problem in abstract terms. Computer scientists are often better at this than data scientists. The first question they ask is “what data structure is this”, or “is this process equivalent to an algorithm I already understand” or “what design pattern should I use to represent this”. Abstracting the problem often enables you to see common patterns, apply well understood analytical frameworks and to simplify radically, giving profound and generally applicable results.

When my team at Barclays were asked to build a general engine to extract Insights from customer transactions, we were able to show that there were 5 common archetypes of insight, that relevance could be thought of as a form of local ranking, and that a couple of mathematical abstractions (called monoids and monads - very different things even if spelled similarly!) could be used to simplify and streamline distributed calculations. That left us with a very generalised Insight Engine which would run very fast and could be configured to address lots of different types of problems.

Abstracting a problem back to its bare essentials is hard, but it is probably the most valuable thing that a problem-solver can do.

5: Don't be constrained by other people's limitations

Perhaps the most obvious tool in the original thinker’s toolbox is ignoring people who tell you what can and can't be done. At university I once asked a maths professor for a hint as to how to solve a problem set. He said, “Here’s my hint… It can be done.” Things are much easier when you believe they can be done.

A big win for the team at JLR was when we found a way of working out what parts go into a car. You’d have thought that this would be straightforward, but it isn’t. We could always do this at the point the car was actually built (obvs) but to do it in advance wasn’t possible because it is done in a proprietary system where we don’t have access to the ruleset. To do so we had to parse a 56 million line xml file where the rules are hidden. And we were told there was no point trying because it couldn’t be done.

Turns out it can (thanks to my brilliant team!).

We can now simulate millions of cars a day, when the base system could only do a few thousand for use by the factory alone. And this allows us to optimise and simulate all sorts of scenarios around how to build cars better.

It is an error to assume that just because you haven’t observed something in your sample, that it doesn’t exist in the population. There’s a risk in pursuing something that “can’t be done”, but if the prize is big enough, it's worth having a go. And don’t expect people to believe in the truth just because you prove them wrong. It takes a while for the impossible to sink in, so stick with it.

6: Rearrange the equation

Often business analytics problems are framed as “manage x in order to increase y”. It's very easy to get fixated on one particular formulation, and to stick with it even when the world around you has changed. Doing so could leave you trying to solve yesterday's problem, and no matter how clever your approach, you’ll get the wrong answer. You’ll need to find a different way to get the result you want.

Anurag Harsh 1 年前

DIK(I)W? Start with Data

Heather Noggle 3 个月前

From Bits to Insights: Understanding Data Types and…

Noorain Fathima 2 个月前

We spent a lot of time and effort building an amazing hierarchical dynamic Bayesian forecasting engine. It worked really well, was widely accepted and was integrated into the BAU process. When Covid hit, everyone wanted us to adapt it to help us forecast the way out of the global crisis. And we tried, but it didn’t work because the world had become inherently unforecastable.

The answer was not to refine and improve what we already had, but to rearrange the equation to discover a better lever.

We realised that instead of trying to improve our ability to forecast the future, we would be better to improve our ability to adapt to the present. So we built models to detect short-term demand signals and to respond to supply shocks by improving order intake and supply chain decision making tools.

Ask yourself: Does what worked yesterday still hold? Is there a better way to achieve the same result?

7: Knowing when to stop

It's not unusual for thinking patterns to get fixated on doing things rather than not doing things. But often the quickest and easiest way to make an impact is just to stop.

If a product makes a marginal loss, you might be better just to stop selling it than to try to make it profitable, especially if getting unit profit to breakeven is going to take time.

Don’t assume that a problem needs to be solved. It could be quicker and easier simply to eliminate it.

8: Asking the experts

Obviously experience counts for a lot when dealing with daily issues. But expertise can get in the way when you are dealing with unusual events, or if you want to find a new answer to an old problem.

Some problems have existed unsolved for so long that everyone assumes that they aren’t a problem anymore. And sometimes questions remain unanswered for so long that people forget they ever wanted to ask them in the first place. But the most pervasive is the Expert that Knows the Answer to a problem (but somehow the problem still exists).

When I joined JLR, the team had been working on a big project for a while. The KPI they were optimising was Stock Turn, and the whole business believed that this was a proxy for Demand. No one questioned it. It was Gospel. The experts would have no other measure. But of course Stock Turn is a function of both sales rates and the level of stock. So if you have different levels of inventory in different retailers, you have different stock turn. It was all rubbish, but the experts never questioned their own certainty, and no one in my team dared question them, because they weren’t Experts.

One of the most powerful questions you can ask (after listening politely to what the Experts have to say) is some version of “what if the truth is completely opposite to what you currently think?”.

9: Time is money

Everyone in business knows that time is money, in theory, but it is amazing how often people ignore it when solving pressing problems.

Whether you are trying to stem losses, or take advantage of opportunities, every day that you spend working on the problem is a day of lost profit. So there is an important balance to be struck between improving the outcome and implementation.

If you are building a product which is expected to make, say ￡52 million per annum, then if you spend an extra week improving it you should ask yourself “will this improvement make me ￡1m?”.

Equally if you are negotiating to buy a tool that will make you the same ￡52 million, you need to shave ￡1million off the contract for every week of contract negotiation. And that might be hard if the cost of the tool is, say ￡5 million. Can you really save 20% every 7 days?

A good idea is to estimate the value that could possibly be added by delay (eg, we might be able to negotiate 10% off the cost of a tool) and then work out how many days of value creation this represents. In one recent case this came out as 2 hours of time: If we spent more than 2 hours negotiating, we would be wasting money.

10: Jumping across domains

Often there are structural similarities between rather different data and seeing this can really help find a better solution. I am not talking about what the data represents, but the way that it is collected, stored or structured.

One example is the similarity between sequences of events (for example customer interactions) and sequences of words (otherwise known a natural language).

We used this “grammar of interactions” to inform when mortgage borrowers were likely to repay, based on the channel, content and the order of their communications with their bank. We used NLP techniques to incorporate this event data into our model and significantly increased performance as a result.

You can do something similar in reverse. When presented with a list of banking transactions (kind of abstract), we realised that this represented real behaviour, a real person walking down the high street, going into shops and buying stuff. Any set of transactions bounded by a short time period could be thought of as a shopping trip. This allowed us to infer all sorts of interesting things about the location of transactions, and therefore the locations of shops.

Whenever you look at abstract data, always think about the form of the data as well as the content. You’d be surprised by the value of what can be inferred.

Zerrin Diren

Technical Sales Leader | Cloud Advocate for Financial and Legal Services | Diversity in Tech | Let's build the future together | Transforming Finance through Cloud

2 年

Thank you for sharing your insights.

1 次回应

Charlie Beveridge

Founder @ Koneksi ?? | Helping people make more personal connections, faster

2 年

We *should* have asked for this - because it’s awesome - thanks Harry! ??

3 次回应

Tassos Roumpanis

Analytics team leader

2 年

#awesomeness

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

How to Think Differently

Harry Powell

Data science leader with track record of innovation and value creation

1:? Using existing data sources in unusual ways

2: Working back to front

3: Supersimplification

4: Abstract problem solving

5: Don't be constrained by other people's limitations

6: Rearrange the equation

领英推荐

7: Knowing when to stop

8: Asking the experts

9: Time is money

10: Jumping across domains

更多精彩文章

社区洞察

其他会员也浏览了

Building a Custom Grid Search for Your Custom Model

Measuring the value from data science

Union Type: A Deep Dive

How Data Folks Can Ace Work Effort Estimates

How to ask ‘WHY’ Questions in Data Science?

Are you “Thinking with Data”?

1000 days of Data - Reflect and Project

Tales from the Trail: Data Foundations

Designing the 'Why' Behind Data Science Projects: A Strategic Approach

Data: The Key to Understanding Our World

1:? Using existing data sources in unusual ways

2: Working back to front

3: Supersimplification

4: Abstract problem solving

5: Don't be constrained by other people's limitations

6: Rearrange the equation

领英推荐

7: Knowing when to stop

8: Asking the experts

9: Time is money

10: Jumping across domains

Graph use-case archetypes

2023年5月5日

Driving sustainable growth in banks by connecting customer data using a graph database

2023年5月4日

What questions should you ask of Chat-GPT based analytics platforms?

2023年3月31日

A business leader’s short guide to Graph Databases: What they are and why you need them.

2021年12月19日

A tribute to my InDigital colleagues at JLR

2021年12月17日

Thinking differently: Avoiding Optimisation 1/2

2021年11月8日

Bayesian A-B Testing

2021年5月23日

Graph Customer Similarity

2021年5月10日

Data Science Case Study 2: NLP Complaint Classification

2021年4月7日

How to be presented to

2021年3月29日

社区洞察

其他会员也浏览了

Building a Custom Grid Search for Your Custom Model

Measuring the value from data science

Union Type: A Deep Dive

How Data Folks Can Ace Work Effort Estimates

How to ask ‘WHY’ Questions in Data Science?

Are you “Thinking with Data”?

1000 days of Data - Reflect and Project

Tales from the Trail: Data Foundations

Designing the 'Why' Behind Data Science Projects: A Strategic Approach

Data: The Key to Understanding Our World