KPMG Google Cloud Business Alliance - Innovation Series #4 data science
Our teams work on some amazing projects with Google Cloud – I often hear our friends (customers, colleagues, Google staff etc) say:
“Wow, fascinating! I didn’t realise KPMG did projects like that. That’s got me thinking…I’d like to solve problem x - how could you help….?”
This Innovation Series is designed to reveal some more about these projects, share ideas we hope inspire you, and give my friends who are delivering these innovations more of the recognition they deserve! This is the fourth instalment in a series of articles and has a slightly different complexion to the first three which were based on 1 or 2 client projects with Google Cloud. This one is broader – more projects, not all with Google - and addresses the common problem of why data orientated / AI / machine learning projects tend to create a lot of excitement in PoC/prototype phase, and then struggle to scale into production and embed in to BAU.
At the root of the challenge to scale into production is the compounding complexity of new/new/new. Novel application of data science techniques where the outcome it not clear because a certain algorithm has not been applied before to the specific data set being interrogated, is one of the ‘new’. Another is use of cloud-based products/services that have recently gone GA (generally available) which also come with uncertainty of outcome, coupled with unfamiliar new elements around skills availability, security & governance. The third ‘new’ is the changes to security & operational process needed to embed the developed product. This intersection of new challenges to be tackled simultaneously by multiple parts of the enterprise in the context of a small and lightly funded proof of concept tends to result in a very promising indication of success from the PoC which the enterprise then faces a very long journey to scale.
This third ‘new’ on process change is the most acute, and this is because adopting the product of a data science project in a mature enterprise will change long established process and job roles and will challenge security approaches which were typically built for an on-prem world. If the client is in a regulated industry, that exacerbates the challenge. Relating to job roles, a digital native has been bought up on ‘creative destruction’ and is probably growing fast enough to re-absorb any people displaced by automation. In a company where roles have been stable for years there is less organisational muscle memory for welcoming and absorbing change. From the security perspective, the high complexity and frequency of compute demanded by machine learning can be best served in the cloud, and that means overcoming new questions about how to host PII and other sensitive data in a third party system, and achieve regulatory approval. In our experience, unless the project has a huge and low risk business return that is material enough to attract C-suite attention and resource allocation, the process of scaling into production and business implementation will be slow.
Rebecca and I have tried to synthesise current thinking, combined with insights gained across a wide team and many years of working with multiple technologies to give a thoughtful and reasonably concise answer to the question we are asked regularly; “Google Cloud are innovators in data science, and KPMG has a big group of functional experts and data scientists, so what are your ‘top tips’ for accelerating value from our data projects quickly?”
The first part of this article makes the point that data projects need to be approached differently to other types of IT project. The second part is a summary of the projects that the team has drawn on for inspiration which provides some context and is also a shameless plug for the amazing data science & analytics practitioners we work with. The third part is a collection of observations & recommendations structured around the six components of the operating model. If you are short of time and already convinced that data projects need a specific approach, skip straight to the final section!
It’s generally accepted that company’s data is a high source of potential value, and companies that manage data well will outperform their peers. This theory has given birth to the term ‘infonomics’ to indicate a new economy related to data. The phrase ‘data is the new oil’ has been coined to give a sense that the old commodities are losing value to data as the engine of developed economy’s, and that political struggles now are more likely to be over access to data than to physical geography. It’s not a perfect analogy; oil is a finite resource so will run out, can’t be used more than once and is challenging to extract, none of which is true about data, however, lets run with it for a moment. To give a sense of how data valuations compare between the largest oil major and a data giant, compare Exxon’s $188b book value vs their ~$200bn market cap to Alphabet’s $203b book value and mark cap of ~$1t – a difference in book:market ratio of x5. These are completely different business models, so obviously lots of reasons why the book:market ratio would be so different. However, let’s assume that just one of the contributing factors is Alphabet’s ownership of and ability to monetise data. HBR reported in its Jan 2020 edition that Ant Financial serves more than 10 times as many customers as the largest US banks with 1/10th of the employees, which is an order of magnitude higher. HBR went on to say that the combination of data-enabled learning and networks effects were the key condition to create competitive advantage from data. So, lets continue with the topic of exactly how we cash-in some of that value.
Driving value from data is not a new idea. More than a decade ago I was part of a team attempting to increase sales by making it possible for a fixed line telco to quote online. Telcos ability to deliver services rest upon a good geographic understanding of where their assets are, and how much capacity they have. In this case we discovered that even the uncontroversial ‘data object’ of an address was subject to much interpretation by various engineering teams, and as a consequence, the inventory of customer site addresses, fibre optical cable and hardware had to be manually interpreted in around half of all cases. The internal supply chain of data being input and refined in various data bases by multiple sales, customer service and engineering teams was creating unreliable information, and without an accurate database of address and asset information, online quoting was not feasible. Now that was a while ago, but data objects have not got any simpler. Imagine that issue multiplied by all the complexities of various data taxonomies and schemas for less well defined objects – online shopping carts for companies with millions of SKUs that are permanently changing, or one of my recent favourites, balance sheet data for capital calculations for submissions to financial regulators where different jurisdictions use different terminology and definitions – and it’s clear why the issue of data quality is crucial.
Our experience is that the technology itself is rarely the ‘longest pole in the tent’, even if that is sometimes a convenient shorthand for a delay in a project report. Yes, it’s true that decisions on what database vendor to pick, and whether to use open source software can smooth the path and reduce integration and ETL effort. However, it’s the policy, people & process elements that are the crucial ones to focus on to move the needle in your quest for data quality and extracting value from data. Google Cloud’s own recent whitepaper AI Adoption Framework here acknowledges the same challenges across four domains of the operating model; technology, people, data & governance processes, and proposes 6 maturity themes that link the four domains. Of the 6 themes, 4 of them are about operating model, not about technology, which illustrates the balance required to successfully implement.
Operating model is KPMG heartland, and there are a number of thought provoking reports available which deal with the topic. In ‘The Future of IT’ report published in 2019 one of the 6 chapters is ‘Data is an asset’ here where we declared that “Organisations must fundamentally change how they view and use data, to capitalise on data’s full potential”. The 4 recommendations are; describe the value of data, clarify roles & ownership, develop data literacy across functions and manage culture change. Echoing the sentiment above, the report confirms that the technology itself is not the most difficult issue to crack. It’s a common experience for a machine learning PoC to fly along until it meets the enterprise functions for security, risk & compliance – which is where those operating model points of roles & culture come into play. We’ve experienced organisations processes for security and eternal data hosting approval take as long as 7 months, which can freeze solid a lightly funded agile PoC project. If the CISO and CRO teams are not consulted early and often, expect your innovative PoC to stay on the shelf. Finally, and very topically given the debates on facial recognition, and the massive changes in consumer behaviour due to the Covid-19 lockdown, data science products are only as good as their training inputs. As the world changes, so the training data needs to be constantly refreshed and hyperparameters tuned to keep the model’s outputs accurate. In addition, ML models also need to be regularly audited for discrimination and bias to inform equitable decision-making. Funding this maintenance effort is critical if the decisions made by the model are to remain relevant over time, and when companies don’t have this commitment in place, the reliability of the outputs degrade quickly, which also has an obvious bearing on people’s enthusiasm for adoption. As the Economist put it in the June 2020 Technology Quarterly “AI can do a lot. But it works best when humans are there to hold its hand”.
At this point you are probably thinking ‘OK right, getting value from data is no different from other forms of technology – you need the people and process too’, well there is one significant difference; data needs to be owned by the business in ways that applications and infrastructure don’t. Whilst some parts of software development and most of software run and hardware/infrastructure build & run can be outsourced as reasonably commoditised and low-value added, data is the fuel that runs the business, and the people who understand it best are in the business. In recognition of this, we are seeing the rise of the ‘Chief Data Officer' at Board-level. This means that consulting providers have to take a different approach – co-build, co-run & transfer is the dominant successful approach, meaning that the consulting partner needs to have a highly collaborative approach to sharing their ‘secret sauce’ to co-building the artefact. They also need to be comfortable handing that over to the client to own as opposed to being protective to try and secure long term annuity contracts.
This second section of this article provides a short summary of a collection of representative projects, led & delivered by KPMG, fuelled by data science that have given rise to the observations & recommendations in the final section. All of these projects started with a successful PoC/prototype, however not all of them evolved into a highly adopted BAU system, so form useful observation points on patterns of which did and didn’t. These projects have not all been delivered using Google technology, so all the points in the final section read across into all the technology platforms & services.
- Global Bank – policy compliance chatbot. Built conversational AI by integrating document ingestion and machine learning for entity extraction into chatbot for the client to provide an easy entry point for policy information retrieval. Forecast to save ~40% of the effort of the policy team in answering basic questions so the team can focus on managing compliance rather than administration
- Oil & Gas - prevention of supplier over-charging. Built document ingestion and probabilistic matching product to assist reconciling invoice payments to reduce leakage and automating verification processes. Identified >4% overcharges between invoices & payments, revealing a saving opportunity worth tens of millions of dollars. (This concept is today delivered for many clients via KPMG’s Ignite toolset)
- Police Force – crime prediction model to support decisions on staffing. Created a prediction and causal inference model that forecast crime categories by geography by month based on historic crime patterns and other socio-economic indicators. The model is used for scenario planning to augment decisions on staffing and preventative measures
- Oil and Gas–shipping process optimisation tools. Developed a model using statistical techniques and FRBS to forecast optimum parameters (dates, volumes, durations) for moving fuel from floating platforms to tankers. The model improved on existing forecasting, leading to increased margin from more efficient shipping, and is expected to reduce spillage events and the associated environmental and regulatory harm
- UK Retail Bank – Classification and root cause analysis on customer complaint cases. Built tool to ingest multi-channel customer communications (email, calls, social media etc) and identify priority cases to resolve and trends to proactively address based on NLP and prediction models. The bank was able to resolve cases faster automating ~70% of the triage process and accurately routing case categories to the right team
- Consumer Goods – pricing optimisation. Developed ridge regression based model to predict sales volumes scenarios for a portfolio of products as prices for individual SKUs and external events changed (ie weather, sports events). This allowed the client to identify revenue maximising price points for their mixed product base
- Global Bank - causal pattern detection on IT incident data. Created distributed causal pattern detection model using various correlation techniques and Monte Carlo simulation to ingest billions of IT log files and identify relationships between incidents to support root cause analysis. This identified previously unknown dependencies that caused simultaneous data base failures, which meant the client could accurately predict and therefore prevent IT outages
- NHS Trust - Prediction of A&E 4-Hour Breach. Developed model for predicting likelihood that a patient presenting at A&E would receive care within the 4 hour target time. It successfully identified ~80% of patients with high likelihood of breaching the target, allowing re-prioritisation to support the most critical cases.
- UK Retail Bank - Automating PPI Claims Operations. Built an engine using OCR and NLP techniques to ingest data and compare key field data, and automate workflow of valid claims into handling teams. This reduced manual effort by accurately extracting 99.6% of data from key fields and removing duplicate applications across channels, enabling the bank to hit the response deadline.
- Consumer goods – recycling monitor for ESG (environmental, social, governance) initiative. Developed model for identifying product packaging entering recycling facility using Tensorflow, pre-trained using Coco and running various machine learning algorithms (YOLO, SSD, faster R-CNN). Model reached >90% prediction accuracy for damaged & obscured packaging on a moving conveyor, validating the clients business case for use of the technology as part of their drive to reduce waste
Whilst these projects are across multiple sectors, different use cases and use a variety of techniques and technologies, they are unified by three themes:
- Large enterprise organisations with significant budgets and internal data science teams, found they could better achieve their goals by inviting an external party (KPMG!) to join their project team
- Typically, had found that despite many successful prototypes or PoCs, achieving the ‘path to live’ and going into Production with a system that was embedded in BAU use was a challenge and enlisted KPMG to help develop the product, as well as achieve business implementation
- The success of the outcome of the project was strongly correlated to the availability of three discrete sets of skills; data scientists, data engineers and business analysts with very strong understanding of data sets and functional process – the latter being the most important
This third and final section is a summary of the KPMG team’s observations & recommendations hung off each of the relevant six components of the operating model; process, people, delivery model, technology, data, governance. Whilst some of these are familiar best practices for technology projects since the dawn of time, you will notice that there are some very specific to the discipline of data science which we hope you find novel.
Functional Process
1) Do the process impact analysis before you start. If your product is going to need to be embedded in the process for an operational team of 5,000 people, will materially change their tasks, and bring regulatory scrutiny for the decisions that it prompts, do the analysis on what it would take for the organisation to be ready for that. Maybe the PoC is required as input to a readiness decision, which is great R&D, but also means it is not a realistic expectation to achieve business implementation immediately the PoC is declared a success
People
2) Proactively manage the culture change. If there are changes required to the way data is captured and manipulated during its travels along the production line, and changes to the tasks that people need to perform for business implementation, have a plan that appeals to the users emotions as well as all the necessary but less motivating discipline of swim lane diagrams, KPIs and gantt charts
3) Assign clear roles and incentives. ‘Product Owner’ is a relatively common concept now in agile software delivery, and works well when applied to machine learning projects too. There is an artefact/model/product produced by the project that needs someone accountable for build and business adoption. The role of ‘Data owner’ is also important to manage the constantly changing demands on process changes, technical integrations, taxonomy etc
Service Delivery Model
4) Plan the path to business implementation as part of the project. Clear & determined conversations about security approval, regulatory approval and operational process change management after the technical event of promotion to production will all pay dividends to achieving business implementation/adoption
5) Design & budget for a run team. Design the process & team to continually retrain & refine the model to ensure the outputs continue to reflect reality and maintain the confidence of the users as the world changes
Data
6) Understand the data architecture and how its generated. Investigate data management through the internal supply chain of policy, process and procedures. If the quality of the data is low and the ability to maintain it is also low, this use case will be a challenge. Lean and Six Sigma are less fashionable than they used to be, but this kind of discipline to process analysis and consumed time & elapsed time measurement both to generate the business case and to validate that the organisation has the right capability to influence data quality at source is critical to forecasting the success of achieving business implementation
Governance
7) Have a business case. Sounds embarrassingly like common sense, but we’ve seen many more PoCs without a business case than with. With a clear value proposition “We will win x% more customers for our flagship product” or “We will increase productivity for process y by z%” the work of winning internal support & resources gets easier. We’ve found targeting a medium complexity project that addresses a specific and recognised issue works best if you are starting the journey. Solving an easy problem is not inspirational enough, and being too far in the future with a vague or undefined issue does not attract enough credibility. KPMG offers an approach called ‘searchlight’ which is just a few days work, and is designed specifically for evaluating candidate use cases for data science and roughing out a business case
8) Board level visibility. Get a senior sponsor, same obvious old recommendation for every technology project. Remember this one is different though, switching an ERP system also needs senior support, but if the new one fails the old one will carry on, and it probably won’t get the enterprise on the front page of the newspaper. Embedding machine learning at scale into your processes, particularly where it touches customers, will attract attention from investors on how that affects your enterprise value and from regulators and customers about how decisions are taken and if they are fair
Technology
9) Choose the technology platform carefully. This entire article has tried to convince you that the technology is not the dependency, and that is predominantly our experience. However, there are a few decisions you can choose to reduce load on the project team. Chose platforms designed to manage the type & volume of data you will want to process on them in 3-5 years time (ie scalability, vendors commitment to investment etc). Consider interoperability and ease of integration (ie maturity of ETL tooling, availability of APIs, open source etc). Think about skills availability and operational maintenance (ie open source, fungibility of your existing experience between technology vendors)
So those are nine recommendations to ensure your data science project moves quickly from PoC to Production and into valuable business implementation, and if you’d like to compare notes on our experience, we’d be pleased to hear from you.
KPMG ranked #2 among leading global Google Cloud AI
KPMG ranked #3 among Enterprise AI Service providers?
*whilst the article includes accurate descriptions of the work that took place and the broad contours of their business cases, all the examples have been anonymised to protect client confidentiality
Hypergrowth | alliances | MBA | author
4 年Hi - a few people have asked about details for some projects - this might be of general interest https://home.kpmg/xx/en/home/media/press-releases/2019/10/kpmg-ranked-2-among-google-cloud-ai-service-providers.html