登录查看更多内容

Are the robots really coming? Pt. 3 (ML Operations)

Jonathan Lanyon

AI Strategy & Transformation Leader

发布日期: 2019年12月5日

Audience: Non-technical Reading time: 8 mins

Pt. 1 gave us a brief introduction to Machine Learning (ML) and covered the 4 types of analytics along with some enterprise business use cases. In Pt. 2 we explored the Data Scientist’s dependency on reliable and consistent data - discoverable, well-governed and well-managed big data exchanges. In this last article, we’re going to explore the challenges and relevant trends surrounding the emerging discipline of ML Operations (MLOps). This is moving beyond a one-off PoC (proof of concept) by a single Data Scientist, to multiple ‘production’ ML models delivering ongoing business value via statistical inference at scale, orchestrated by a broadly skilled Data Science team.

A Notebook is the Data Scientist’s preferred application or workbench tool to create model prototypes. Running locally on a decent laptop or PC, datasets (for training and testing) and ML libraries (algorithms and pre-written code) as well as bespoke code are combined to produce visualised output in chart or plot form. This allows different ML algorithms to be compared and the best performing selected, as well as the resulting model (data+algorithm+code) to be rapidly tuned.

To understand more about the process of deploying a prototype into production (a live environment which the enterprise depends on), a useful comparison is the discipline of modern Software Engineering. Using modern Agile ways of working, software development can now be measured in terms of functionality and release frequency, which allows progress to be tracked. This in turn enables better resource planning and scheduling to sync with an enterprise’s budgetary cycles, as well as other BUs. In order for individual developers to move beyond writing code which accesses local (software) libraries, certain modern practices, technologies and roles have made teams working at scale possible;

Code repositories (repos) are centrally store individual code contributions organised in projects from a whole team, with a full audit trail enabling full versioning control to govern constant change.
Continuous Integration allows individual developers to take existing functional and approved code (a branch) and work in parallel to other developer’s branches to write new code and functionality. Testing Automation ensures any new code is only allowed entry (merged) into the repo if it functions as expected and doesn’t break anything else (ie introduce bugs).
Continuous Deployment enables new code functionality (and business value), to be frequently released into production in a highly automated ‘touch-button’ fashion (it explains LinkedIn and Facebook’s endless app updates !). Release versions can thus be rolled-back (backed-out) should any problems be detected by supporting Monitoring services.
Containers (highly portable computing units) ensure environment consistency (ie developer libraries) from developer laptops through to production. They also provide a highly scalable and orchestrated fleet to cope with the ebbs and flows of user demand.
Current Cloud Services and supporting Infrastructure enable items 1-4 to be leveraged within hours as opposed lengthy build outs of on-premise Infrastructure from scratch.
Lastly, DevOps or SRE (Site Reliability Engineers) folk run and maintain said supporting Infrastructure in Agile teams, working closely with developers to ensure code is frequently deployed without any critical outages (recent online banking outages spring to mind). Regarded as much of a cult as a role, DevOps engineers are multi-skilled and play a critical role to ensure the timely but safe progression of developer’s code into a live state.

In comparison with Software Development, developing let alone deploying ML models is inherently more complex. Models have a high degree of entanglement due to their origin of training dynamic data with algorithms (as opposed to just software code). To improve model accuracy for predictive outcomes, prototyping involves repeated measuring, evaluation and tuning. In other words the CACE principle applies to ML - Changing Anything Changes Everything. Another dimension of complexity exists because Data Science advancements manifest in new tooling and ML libraries. A Data Scientist therefore will likely need access to and indeed demand a wider range of libraries and frameworks than Software Developers. To practice MLOps, leveraging modern DevOps practices and effectively governing model deployment, becomes an explicit necessity.

If the methodologies and practices touched on above cover process, what might the core Technology or Platform look like for an enterprise aiming enable ML outcomes throughout the business? Current solutions can be summarised as;

Proprietary. As well as developing many Big Data and Software Engineering technologies and methodologies which many enterprises use and aspire to follow, the Digital Natives have also developed their own ML platforms to work with their data ecosystem. Each have their own workflow and leverage open source components as well as bespoke elements. Whilst their architecture has been shared in talks and whitepapers, they remain largely unavailable to the outside world.
Cloud Service Providers (CSPs). Each of the big 3 (Amazon Web Services, Google Cloud Platform and Microsoft Azure) provide fully-functioning ML services which help to deliver advanced analytics outcomes. These services represent a low level of entry which allows rapid prototyping assuming the data is available. They also serve as an excellent training ground with built-in workflows for new teams to cut their teeth on. However, CSPs tend to charge per model version, so in a production setting costs may soon prove unpalatable to a procurement concious enterprise.
Commercial AI/ML Platforms & Services. Whether written from the ground-up or evolving from automated big data analytics platforms, these ‘AutoML’ solutions are frequently acquisition targets for big GSIs (Global Systems Integrators) or the CSPs themselves. As is often the case with marketing hyperbole, the ‘one-stop shop’ AI Platform promise often underplays the deliberately restricted featureset. Commercial MLaaS and DSaaS (Machine Learning/Data Science as a Service) outsourced offerings with a simple workflow and framework have been quick to emerge, some consisting of a service wrapper around existing CSP Managed Services.
Open-Source. In terms of publicly available software, libraries and MLOps frameworks (at least 2 exist at the time of writing), open-source probably represents the frontier of ML innovation. Contributions are rapidly evaluated an expert community and improved on using real world problems. For an enterprise business, this route presents an opportunity to build a flexible, modular ML platform to suit the organisation’s needs and provide the Data Scientist access to the widest array of algorithms, languages and evolving ML libraries. With choice however comes complexity and the associated specialist support overhead will understandably put off some enterprises.

From an ML trends observation viewpoint, I should point out that many contributions to Open-Source (4) come from Digital Natives (1) themselves. CSPs (2) continue to overwhelm (this author at least!) with new and improved ML-related services. Announced often at CSP’s annual showcase events, services are becoming more streamlined and better integrated with their other services. Its clear that ML platforms, tooling and frameworks are evolving at breakneck speed. The race to commoditise and create highly scalable and compelling products by ISVs (Independent Software Vendors) and CSPs is well underway and just how quickly this most complex business problem becomes distilled and bottled remains to be seen. As things stand, all of the above are completely dependant on reliable and consistent big data.

Let’s move onto the most unpredictable topic at hand, People. What might a Data Scientist team look like and how best could it be deployed within the enterprise ? As an emerging discipline, Digital Natives such as Google, Facebook and Uber have been pioneering of ML Operations development. Driven by their ML explorations over recent years, figuring out how to ‘run’ models in production was a mountain they were forced to summit. In terms of skillsets, such individuals are described as smart creatives, people who can code proficiently in multiple languages, understand data science, utilise Cloud services and have an aptitude for problem solving (not to mention reeling off TED talks at will !) From a talent pool perspective, such individuals are in short supply and probably already working for a Tech giant with stock incentives. With this in mind, its unrealistic for enterprises to expect to land a few unicorns to transform their organisation’s fortunes. Instead, realising ML outcomes at scale will require the assembling of a diverse and collaborative team to bring together Data Science, Data Engineering, DevOps, Cloud Infrastructure and Product Management. Job titles vary but the roles listed below should be fairly universal;

Skills Realm: Roles

Data Science: Data Scientist, Data Analyst

Data Engineering: Data Architect, Data Engineer

DevOps: DevOps Engineer, Test Automation Engineer

Cloud Infrastructure: Cloud Architect, Cloud Engineer

(Product: Product Manager, Business Analyst)

Some organisations may favour a centralised Data Science department to build an MLOps proficient team which should include some form of Product Management to interact with business stakeholders. With more ML experience in-house, others may favour deploying smaller devolved squads with BUs to gain expertise in that particular data domain as well as closer collaboration. Either way, for business leaders to trust predictive ML enough to augment or replace existing decision making, the closer Data Scientists work with the business, the better. One route to be treated with scepticism would be deploying a Data Scientist team within an Operational IT department, given the conflicting objectives of innovation vs risk avoidance.

Below is a simplified but holistic visual representation of ML Operations, consisting of Model lifecycle stages (Technology), the associated activities (Process) and the skills domains (People) underneath.

Hopefully this helps convey that just hiring a Data Science team or leveraging outsourced service offerings alone, is likely to be ineffective. Enterprises should aim to build a multi-disciplined Data Science capability, nurture fail-fast experimentation culture, enable the team to select appropriate tooling and frameworks, empower collaboration with the business, and provide the necessary governance rigour. Data and ML leaders will need to drive the transformation from the top down as an explicit mandate. Without leaders being adept translators between the business and technical realms of ML, it is unlikely that senior business stakeholders will trust and adopt the recommendations of these new-fangled predictive outcomes. When devising an ML strategy at enterprise level, I would suggest the following summary points merit careful consideration;

Data Scientist’s are entirely dependant on access to big data to train and test models (see Pt. 2) More specifically, accessible, well-organised (catalogued) data of consistent quality requires a coherent Data Strategy and well managed DataOps (Data Operations). Worse case scenario ? Data Scientists will spend 80% of their time repetatively preparing data.
MLOps at production scale is uncharted territory outside of Digital Natives (ie Amazon, Facebook, Google, Uber etc) limiting the talent pool available to enterprise business.
Compared to Software Development, MLOps is more complex and unpredictable due to the entanglement of algorithms, code and dynamic data in models.
Compared to Software Developers, Data Scientists will likely need access to a wider array of tools and libraries in a rapidly evolving space.
Supporting Infrastructure and modern Software Engineering practices need to be leveraged to achieve a high level of automation and model deployment velocity.
A single enterprise-ready off the shelf ML solution (be it platform or outsourced service) simply doesn’t exist presently.
MLOps is the convergence of several engineering realms. In-house enterprise teams require a range of skills from Data Engineering, DevOps and Business Analysis, as well as Data Science.

So then, that provocative Robot article title and image ! What is the likelihood of enterprise jobs being lost to AI? It’s clear from technology leader surveys that the promise of AI and ML is a hot topic no digital marketeer isn’t aware of ! Enterprise business’s appetite for ‘quick wins’ and track record for adopting emerging technology is questionable (take for instance Agile methodologies). Given the complexity of ML and challenge MLOps at scale, there is little to suggest this particular trajectory of adoption will differ ! From this author’s experience at least, it’s often the people elements (ie skills + culture) which present far more of a barrier to change, than the technology or processes. A holistic and well-orchestrated transformation is more likely to yield business value from ML than any element in isolation. Leveraging ML will require new skills and understanding in addition to existing domain experience within the current workforce. I would suggest therefore that as well as presenting consultancies and technology vendors with a huge opportunity, ML currently presents people (not robots!) with an incredible opportunity to skills shift. Is the upside ahead of us therefore, less repetitive manual tasks replaced instead by more mentally stimulating ways of working ?

Raza Sheikh

Data & Digital Architect | Consultant

1 年

Jonathan, thanks for sharing!

Luca Bolognesi

Manager, Field Engineering at Databricks

5 年

Very well written and engaging articles! I reckon the ability of the members of this cross-functional team to collaborate, understand boundaries and individual responsibilities is absolutely crucial for the team to succeed, given the overlap across these roles.?

1 次回应

Ludovic Veale

Executive Partner (Data & Analytics)

5 年

Very interesting article. As you say People are still the biggest barrier to successfully adopt operational ML!

1 次回应

Jonathan Lanyon

AI Strategy & Transformation Leader

5 年

Thanks Martin Harwar?- a point well made and duly noted. I will update this bias !

Martin Harwar

Product Director

5 年

Very good, very good! One point I’d make, though, is that it’s not just ‘marketers in the Western World’. MLOps is set to explode the world over.

1 次回应

查看更多评论

要查看或添加评论，请登录

Jonathan Lanyon的更多文章

The Trouble with Generative AI: Pt.2

2024年10月14日

The Trouble with Generative AI: Pt.2

Level: Foundational Reading time: 9mins In Pt.1, we looked at the rise of GenAI, investment and costs, Machine Learning…
The Trouble with Generative AI: Pt.1

2024年10月7日

The Trouble with Generative AI: Pt.1

Level: Foundational Reading time: 9mins In case you missed it, OpenAI’s ChatGPT generative AI (GenAI) tool was…

14 条评论
Are the robots really coming? Pt. 2 (Data Strategy)

2019年11月21日

Are the robots really coming? Pt. 2 (Data Strategy)

Audience: Non-technical Reading time: 6 mins In Pt. 1 we explored at high level what Machine Learning (ML) actually is,…

2 条评论
Are the robots really coming? Pt. 1 (ML Introduction)

2019年11月15日

Are the robots really coming? Pt. 1 (ML Introduction)

Audience: Non-technical Reading time: 5 mins Its difficult to avoid news articles on a daily basis about artificial…

1 条评论

Are the robots really coming? Pt. 3 (ML Operations)

Jonathan Lanyon

AI Strategy & Transformation Leader

Jonathan Lanyon的更多文章

社区洞察

其他会员也浏览了

What is MLOps?

Reducing Critical Project Delays & Weekly News Roundup

Marvelous MLOps #38: MLOps Operating Models

Knowledge Graphs: Today's triples just ain't enough

How Automated Testing Strengthens MLOps Pipelines

AI Morphic Framework (AIMF): A Vision for Self-Evolving Software

??From POC to Production: The Hidden Complexities of Scaling No-Code AI Agent Systems

MLOps

AI in your org chart. Where and when?

April 17, 2024

Jonathan Lanyon的更多文章

The Trouble with Generative AI: Pt.2

The Trouble with Generative AI: Pt.1

Are the robots really coming? Pt. 2 (Data Strategy)

Are the robots really coming? Pt. 1 (ML Introduction)

社区洞察

其他会员也浏览了

What is MLOps?

Reducing Critical Project Delays & Weekly News Roundup

Marvelous MLOps #38: MLOps Operating Models

Knowledge Graphs: Today's triples just ain't enough

How Automated Testing Strengthens MLOps Pipelines

AI Morphic Framework (AIMF): A Vision for Self-Evolving Software

??From POC to Production: The Hidden Complexities of Scaling No-Code AI Agent Systems

MLOps

AI in your org chart. Where and when?

April 17, 2024