How to Make Your Analytics Project a Success (Hint: It's Not Tech!)
Andy McMahon
Principal AI & MLOps Engineer @ Barclays | Author | Visiting Lecturer @ Oxford, Warsaw
As a data scientist working in industry, I've been involved in and led several projects now where the aim is to use an organisation's data to gain valuable insight or augment decision making. This is different from many other kinds of technical project, since you will almost always have an R (research) that has to accompany your D (development), and you are very often attempting something that has not been done before. Keeping things on track and executing effectively can be tough, so I thought I'd share some of the things along the way that have both went well (woop) and not so well (oh oh) from the point of view of a data scientist serving the needs of other colleagues throughout a business.
The (Analytics) 5 Whys
The 5 whys is a technique for getting to the root of a problem, originally developed by Sakichi Toyoda and implemented at Toyota Motor Corporation, becoming very popular there in the 1970s as the developed their manufacturing processes. Although it was designed (and is still used today) to find out why things have not gone to plan, I like to turn it on its head slightly and use it as a tool to find out the reasons for pursuing a project, and to determine it's prioritisation relative to other demands from the organisation.
I don't use it in any set way (and sometimes I don't need '5' of those whys) but I just use it as a mnemonic to help force me to question assumptions we may take for granted, especially at the beginning of a project. For example I may ask myself, my team or the customer or subject matter expert involved in the project (more on this later):
Why do you think you need the help of analytics?
Why do you think that data or data science could help solve your problem?
Why have other solutions you've attempted in the past not solved this problem?
Why do you think this problem is worth the investment of time and money that may be required to solve it?
Why now and not next quarter, next year, etc?
Again, remember that these are just suggestions, the key thing for me is always to get to the bottom of the following (and you can definitely stray from just asking 'why' questions!):
- Is there a real use case or need for analytics here (of whatever complexity), if we can solve the problem in a simple and scalable way without analytics (perhaps a change in business process or better training), why can't we just do that, solve the problem and get value from the solution ASAP?
- Are you (customer, SME, analytics team) sure that there is real value to be had here and that this is not just a vanity project (more on this later)?
- Are we (the collaborative team of analytics + customer/SME) happy that this problem is even soluble given time and resource constraints?
As long as you are thinking critically about the pros and cons of tackling the problem, its solubility and its potential ROI, you really can't go wrong.
It's a Team Thing
We all understand that complex projects require a lot of resource and expertise, but the single most important thing to recognise when running or participating in an analytics project is that you absolutely must bring people across very different domains together. If you don't do this, you will fall into the trap of thinking that you have all the bases covered, and then when you present your work to an important stakeholder, the foundations will give way and you'll watch the value of your work crumble in front of you - not a nice thing to happen!
Specifically you need to create a collaborative team with at least the following components:
- Business/Organisational/Domain expertise: Inside the company there will be at least one person (most likely several) who has both a high level view of the strategic importance of solving the problem as well as a good knowledge of the surrounding domain and business processes. You need someone like this on your team. You don't need them full time (they will have a day job) but you must keep them as involved as possible. Going with 'scrum' terminology, you can consider them your product owner, who gives direction and feedback on your prototypes and interim results. I like the approach of short weekly or bi-weekly calls, where you update this person on progress, you discuss issues / questions you are having and perhaps most importantly - you use them as a resource to sanity check your results, inferences and use them to help point you to other relevant subject matter experts (SMEs).
- Analytics Project Manager: This is the role I'll often take in a project, you basically need someone who will coordinate between the business SMEs and the analytics team members who are doing the work on the ground. This role requires you have the ability to translate business requirements (which may be nebulous or ill-defined) into concrete analytics tasks and then manage their execution. For example, if you feel comfortable taking the statement 'We need to improve our sales figures in Africa' and after discussing with your SMEs can translate this into 'We need to provide statistics on historical sales rep performance, broken down by region, sector and quarter, and we need to isolate correlations between these and external market data such as oil and gas prices' then you're probably well equipped to perform the 'translation' piece of this role.
- Analytics and Engineering Resource: The previous two role types are very focused on articulating what you should be doing, this last set of team members are all about the how. In collaboration with the analytics project manager these team members will take the analytics requirements and implement them in code using your stack and chosen technology platform(s). This part of the team must make sure to document their experiments, results and solutions and to present their progress as frequently as possible to the other team members for feedback. What they find or build will influence the analytics project manager's decisions and what the SME can expect to see as a final product or solution, so constant communication with other stakeholders is so important.
- The customer/user: Most of the analytics and data science projects I work on now result in an application or tool for someone in the business to use frequently to help them make decisions. It is therefore absolutely vital that you find out: a) what they need to see / know in order to solve their problem, b) how they want to interact with this information or tool, c) the tolerances that are acceptable (accuracy, latency, update frequency, etc). This usually becomes more and more important when have already determined that your problem is soluble and therefore that it is worth your time building such a solution, but having the feedback and insight of the user is going to mean you build something that will actually be used and not just sit on your systems gathering cyber-dust.
Once you have these people working together, it is really important to constantly check in with one another and work in a cycle of 'build-measure-learn' as espoused by Eric Ries in his excellent book 'The Lean Startup'. Basically, you should work in agile manner where you build an MVP, measure or sanity check it's results and then learn from that and iterate. This applies both to your analyses as well as any software or tools you build.
Image from here.
If you have these pieces pulled together into your team (and I should state here that some people can fulfil multiple of these roles, especially on the analytics management and execution side) then you will ensure that your work is built on a solid foundation and can start moving quickly towards realising value.
Data, Data, Data
You've answered the questions around why this project is important, you've defined the problem from both a business and analytics point of view, now it's time to get started. Obviously in a data science or analytics project, your results and solution are contingent on the data you have. If it is of poor quality or there just isn't enough of it to cover all the scenarios you want to investigate, then you are at a non-starter.
To ensure that you have the data you need for your project, keep the following in mind:
- The business experts should know what 'looks right': Always start your analytics project with a quick sprint where you analyse the data at a high level and look for key patterns or trends that should act as 'sanity checks'. For example, if you are analysing data for the sales problem we alluded to earlier, can you retrieve the trends of sales figures in Africa and other regions and show them to your SME partner? Do they agree roughly with what they are seeing, does it tie in with other data sources they may have, is it consistent with that the company reports on P&L or to shareholders?
- The data experts should know what is 'enough': The analytics experts in your team will have a good grasp (I hope!) of what volume of data are needed for different algorithmic and technical approaches. For example, if you can only realistically see the problem being solved by a neural network (perhaps you have phrased the business problem as an image recognition task) but you have 20 instances, then it has to be quickly fed back to the rest of the team that 'this isn't going to work, a different approach is needed'.
- The team has to find out the lifespan of the data sources: I feel that this is both important and often overlooked. If you are going to spend 2 months on a project but your main data source will be decommissioned or replaced with a new application in 3 months, then you have to really stop now or pause. You could come back to the problem when they new data resource is up and running and in the meantime work on other valuable work.
- "80% trustworthy is better than 0% utilised": I came up with this a few months ago for a meeting, and now I just repeat it every time I get the chance (sorry for the obvious self-promotion!) because I think it is so important. It is vital that the business understand that no data is perfect, no algorithm is 100% accurate, no solution will cover all possible cases. Building up a comfort with uncertainty, especially in regulatory or commercial environment which is necessarily risk averse can be difficult but is among the most important tasks for the analytics function to perform if they (and the organisation) is to succeed.
Develop, Deploy, Dissect
You create your analyses, you develop your solution and you deploy on your tech stack. Your analytics and non-analytics team members are all proud of the work they've done and you've all convinced the business that we'll realise value from the blood, sweat and tears. But, this is now the hard part - you have to maintain your product and try and rationalise it's value add, while most likely moving on to new projects.
A big part of doing this will be agreeing on the metrics you want to use to determine success. I always make sure that this is done at the start, both with the analytics and business expert sides of the team. You need KPIs and metrics that are attainable but also demonstrate good ROI on your investment, and this can be difficult. As long as you are all open to changing these metrics as things progress then you'll avoid analysis paralysis and just get something down on paper, which is key if you are to move forward.
One thing I think is important as well is to determine when you think you will see the impact by (and this should go into your initial rationalisation of the worth of the project). For example, if you think that "this solution will definitely save at least $1 million dollars .... over the next decade", then is it really worth pursuing? It's also important to try and automate the tracking of the value your solution is creating: can you infer this from usage metrics? "Our solution has been used every day for the past 6 months by the financial team to make decisions on budgeting, so we've had an influence on over $6 million dollars worth of budget allocation in that time". Can you infer your impact from data related to the value you want to see? "We've noticed a drop in material consumption of wires and cables on projects of around 5% since we deployed our solution a year ago, and statistical analysis suggests that this is sizeable enough that we could attribute it to the tool we developed". Finally can you just get at the raw dollar values "After deploying our tool we've reduced acquisition costs of new customers by on average $4,000 and have increased average revenue per sell by $1500". Remember, money talks, so anything you can do to tie your work to dollar value is good for justifying the investment in the product, and more widely for the investment in analytics at your organisation.
Go Forth and Analyse!
Hopefully the examples and ideas I've written down here can help you with your future analytics projects. I've learned a lot of these things the hard way, so I hope that they can both help you avoid some of the mistakes I've made as well as to innovate on them and create awesome new ways to realise value from your analytics and data science projects. Until next time, keep up the good work!
Principal AI & MLOps Engineer @ Barclays | Author | Visiting Lecturer @ Oxford, Warsaw
5 年Found some spelling mistakes I’ll fix ASAP ....