The Data Science Maturity Model
This post was originally published on The Sampler, our in-house blog at Applied AI. Subscribe to our RSS or email feed for regular posts on machine learning, statistics, insurance and fintech.
Over the past year we've refined this simple model to help map, evaluate and improve our clients' data science capabilities, it might work for you too.
At Applied AI, most of our client projects lean very technical - we're a very technical team with experience in insurance, machine learning, quant finance, software development and more - and data science is still a fairly new field with high demands on mathematical and engineering capability.
That said, we always encourage our clients to undertake projects as part of a larger, more holistic approach to improving their data science maturity: using a statistical approach when deciding strategy and embedding 'data products' within their day-to-day operations.
So we cooked up the following Data Science Maturity Model.
It's purposefully very simple - a familiar 2x2 matrix - and describes a clear path for organisations to improve their capability in terms of the Analytical Complexity and Operational Implementation of new data sources, statistical modelling, products, processes and teams.
We've found it very useful when talking with clients, so it's worth sharing here: I'll step through it in more detail in the following paragraphs.
Getting Started
Start to build a high-quality data science capability
This is the first step, and it's surprising how many organisations want to launch into the blue or yellow squares (Sophisticated Analyses and Business Operations respectively) without spending time here to correctly lay down the bedrock.
At this stage, the major task is to identify & properly qualify the business opportunities, benefits and risks associated with starting a new data science capability. This requires strong support from senior leaders in the business, and depending on the size of the business, it's likely you want to appoint a fulltime role.
Build the Team
- Start small, with 2-4 experienced generalist data scientists: require strong creativity, communication and domain knowledge and identify opportunities to use external specialists
- Look for strong technical ability, although not necessarily in narrow fields, this stage is far more about general analytical capability and senior-level communication
- Assign and support a strong corporate sponsor and ensure effective communications with the business
Define High-Quality Projects
- Identify and qualify sticking points in existing business processes: e.g. customer churn, fraud, reserving, credit risk, marketing impact
- Identify potential data sources, data owners and data processing requirements
- Undertake very initial analyses to demonstrate and communicate the potential improvements to these identified processes
Above all, a data science function is a highly technical, collaborative business effort involving research, development, operations and all parts of the organisation. It should be built carefully in order to enable the best expertise and technologies.
Sophisticated Analyses
Use small, agile projects to explore business opportunities
The wider impact of a data science capability will be felt in the green and possibly in the yellow squares, but the sheer momentum of a large business usually means that getting to those stages is hard without first proving some clear benefits. We do this by increasing the Analytical Complexity, and this second step can be one of the most fun from the technical angle.
The core purpose of this stage is to move nimbly and deliver clear value, either through immediate operational improvements or longer-term strategy.
Deliver Small, Highly Analytical Projects
- Many high-level costs can be quickly reduced though time-to-event and/or unsupervised modelling, inc customer claims, churn, anomalous and fraudulent behaviour etc. leading to a higher-quality customer base
- Top-line growth is possible through high-dimensional understanding of the external marketplace, and tailoring your products and marketing accordingly
- In the insurance arena, it's pertinent to start helping your risk and compliance teams make better use of data science too: get them involved.
Systematise Learning from Data
- Conduct analyses as well-defined agile software development projects with plans, goals, meetings, progress tracking, knowledge sharing
- Use modern, flexible technologies and require source code control, repeatability and auditability
- Continue to use the core team with external help if required, and provide opportunities to experiment: accept & expect failure as part of the learning process.
Many business issues can be modelled using sufficiently advanced statistics, and at a high level of abstraction we many not even need a huge amount of data. This stage certainly does not involve 'big data', but rather, Good Data: well understood, well sourced, under control, quick and easy to use.
Business Operations
Automate & productionise successful analytical work
Now it's time to embed wider impacts into daily business operations. The reason we suggest this stage follows the Sophisticated Analyses is because organisational momentum tends to dictate we prove the benefits first, and then Operational Implementation follows.
The core purpose of this stage is to operationalise data sourcing, processing, analyses, and communication within the business to step towards making data science a long-lasting capability.
Make Data Sources & Analyses Robust
- Map, document and assign responsibility over the data pipeline: acquisition, processing, storage, availability and archiving
- Likewise assign responsibility for analyses: ensure high-quality documentation, repeatability, code management and error handling
- Ensure high-quality knowledge transfer within the team: sharing sessions, presentations, wikis, whitepapers etc.
Begin to Augment or Replace Internal Systems with Data Products
- Identify existing systems which could be augmented or even replaced by 'data products' which have machine learning at their core
- Qualify the level of improvements and engage business owners and senior management to help plan integrations etc.
- Review existing software and service providers and consider renegotiating new arrangements based on what you can now do internally.
Whether it's marketing-led communications analysis, operations-led image recognition, risk-led customer classification, or compliance-led stochastic portfolio simulation, bringing data products into daily operations is tricky, but immensely beneficial to the business, and worth approaching in a rigorous manner.
Full Capability Data Science
Data science and high-impact projects are deeply embedded into products, services, business administration and the senior decision making process.
This is the end goal: a high-quality data science team running at full-steam, capable of sophisticated analyses, implementing new tools, services and data products and aiding decision making throughout the business.
You can't get here through simply undertaking advanced predictive modelling, and nor through buying the latest 'big data lake', it requires both mathematical expertise and operational implementation.
Embedded Organisational Change
- A highly-skilled cross-functional team of data scientists and data engineers, capable of exploring and modelling new business opportunities and operational improvements
- Well-specified career paths and support for continuous learning, inc tie-ins to industry forums & conferences, and part-time or sabbatical-based higher education
- Strong representation at board-level (e.g. a Chief Data Officer) and an associated organisational culture of learning from data.
Full-Stack Data Products
- The business is able to make full use of analytical insights through effective data products and services, improving top-line growth, reducing operational costs, reducing risk and meeting compliance
- The team is equipped with well-understood data, software and systems to allow data science projects to run from exploration to exploitation
- The business will likely create new products and processes based on these insights, and potentially sell back into the market or new markets, based on their newfound, qualified knowledge of the sector.
The full-capability data science department is place where an idea can be tested, decisions made and actions swiftly taken. New projects may include: macro market segmentation, modelling the customer lifecycle, discovering the voice of the customer, marketing campaign experiments, triaging claims and prioritising their handling based on risk and cost/benefits, fraud and anomaly detection, modelling credit risk, modelling reserves and supporting compliance.
Where does your organisation fit?
Our Data Science Maturity Model seems to work a lot of the time: most organisations, particularly in the insurance sector, can map their current capabilities and future aspirations, and we know many companies in each section.
Where does your organisation fit? Let us know.
We are domain & subject-matter experts with experience helping clients succeed in all areas of the Maturity Model, and we specialise in exploratory projects with high analytical complexity to define the true need and meet it in a pragmatic way using the most appropriate technology and scientific approaches.
We also have strategic partnerships with complementary organisations to ensure projects can become full production systems and clients become self-sufficient.
Finally, if you liked this post you may also like our extended thoughts on Delivering Value throughout the Analytical Process, Building a Data Science Capability and the Role of Data Science in Insurance. That said of course, it's only a model.
This post was originally published on The Sampler, our in-house blog at Applied AI. Subscribe to our RSS or email feed for regular posts on machine learning, statistics, insurance and fintech.
Helping VCs turn networks into their biggest advantage
8 年Nice work Jon