登录查看更多内容

AI/ML at Scale in Production- Common pitfalls and how best to avoid them

Prakriteswar Santikary, PhD

EVP, Chief Technology Officer, HelixVM

发布日期: 2022年7月1日

Executive Summary:

Just about a year ago, I was presenting at the Global Big Data and AI conference. There were about 100 technologists attending my talk.

I started asking – “Who in this virtual room has developed a ML learning or AI model for their business?” – Being a technology conference, almost 90% of the hands shot up.

Then I asked – “how many of you have that code in production at scale?”. Nearly every hand went down. ?

Obvious question is – Why are these ML models not in production? What's the root cause ? Or alternatively, if the models are not in production, what values are they providing to the business except probably completing proof-of-concept or completing a "science project" ?

Simply put, for AI and ML models to have measurable business impact, they must be brought to production at scale to solve business problems. Having completed “Proof-of-concept” is not enough to beat the AI hype.

AI/ML models must work in production at scale.

Nearly 85% of the AI/ML models never make it to production

There are obviously good reasons behind this dismal statistics.

AI and ML model lifecycle is very complex and cyclic with lots of experimentations involved, of which the model creation phase is a very small piece of the overall lifecycle and complexity, particularly when we consider what it takes to bring models to production at scale to solve real-world business problems. The complexity lies in everything else other than just having a good and well-trained model.

Here everything else means complexities associated with automation, production operations, real-time monitoring, rapid and automated deployment of AI/ML models, dealing with model drifts, dealing with quality of served data, and model governance, just to name a few.

In fact, real complexities associated with AI and ML only begin when models are about to be deployed with scale, performance, security and auditability in mind in production.

Until we tackle these complexities head on, models will continue to gather dust and remain be in proof-of-concept stage in labs and will never see the light of the day.

5 Practical thoughts from the trenches of innovation as to how to tackle and overcome these complexities

1 AI/ML strategy must align with enterprise business strategy and data strategy.

Just like any enterprise-scale initiative, AI/ML initiative must start with understanding the Why (strategy) and What (business requirements) before embarking on How (ML solutions or NLP solution or a combination including heuristics) and Where (Infrastructure and tooling, cloud or on-premise or hybrid, AI on edge server, etc), meaning establishing a full alignment between business and the cross-functional data science teams, and agreeing on business use cases and how to measure success. ?The initiative must be tied to business KPIs to garner executive support, not just from budgetary perspective, but also through the usual thick and thin of the initiative itself. In all likelihood, the AI strategy needs to be tied to your product strategy.

Our experience suggests that most AI and ML projects don’t go into production because expectations are not well communicated with the business and there is no agreed upon definition of “What is good enough for AI to be successful?”.

Combining data science expertise with stakeholder business understanding is relevant to achieve tailored and actionable outcomes, creating a valuable experience for the business.?

2. Focus on data understanding, quality, security, privacy, and governance

Data is the lifeblood of AI. There is no AI and ML without good quality data. And often, you also need a lot of it, depending on your use case. The AI use case must be tied to your enterprise data strategy, meaning data understanding, whether you have the data that you need to train your model, whether data is already integrated and easily accessible, the quality of the data, whether this data is governed or not. Not to mention, technology and architecture play a big role in terms of data processing, data enrichment, data integration, scale, performance, and reliability. Data governance plays a big role to ensuring sustained data quality.

Data security and privacy represent a very broad, but important domain for any software application, and AI/ML-driven applications are no exception.??It is highly likely that ML applications that you are productizing will often involve personally identifiable information or, in health contexts, protected health information (PHI).?In addition to making sure your data and environments are well protected, there are specific considerations you should make for your deployed model. First, you should make sure to consider the risks of your model behaving badly. What would happen if your model produced the most erratic output you could imagine? What would be the impact on consumers of such predictions? What are the financial, reputational, security, or safety risks that could occur as a result? Depending on the severity of risks, you may want to implement extra guardrails against erroneous output.

领英推荐

No data? No AI

IBM 1 年前

??BEYOND DATA & AI TO VALUE: Now and the Future (PART…

Litsa Roberts 1 个月前

Harnessing the Power of Shakti LLM Series

Kamalakar Devaki 4 个月前

3. Focus on building skill sets in AI, data engineering, cloud engineering, and modern data architecture and platform.

Data science is a team sport. It requires a very strong cross-functional team consisting of team members from business, software engineering, data engineering, software quality engineering, ML engineering, Operations, regulatory, compliance and Infrastructure. Skill sets required to pull off data science projects are also many, including knowledge and expertise in statistics, algorithms, data pipeline engineering, business domain, cloud engineering, test automation, modern data integration techniques, modern SQL, NoSQL and Big Data technology skills, APIs, data visualization techniques, among many. Knowledge about scaling software at an enterprise-scale plays a critical role as does end-to-end automation of the ML lifecycle.

4. Focus on AI/ML model operations at scale - MLOps

Businesses don’t realize the full benefits of AI and ML primarily because models are not deployed or even if they are deployed, they are not at the speed or scale to meet the needs of the business.

The level of automation of the Data, ML Model, and Code pipelines determines the maturity of the ML process.

Establish Continuous Integration (CI)?- Remember ML modeling is about data, model and code – all combined, unlike in typical software development where you only need to worry about code. Therefore, CI needs to be extended to test and validate data and models.

Establish Continuous Delivery (CD)?- this is for the delivery of ML pipeline that automatically deploys the model to production at scale.

Establish Continuous Training (CT)?- this is unique to ML systems property, which automatically retrains ML models for re-deployment. This is complex and this requires A/B testing, among other things.

Establish Continuous Monitoring (CM)?– this is to establish monitoring of production data and model performance metrics, which are bound to business metrics and performance.

Establish Model Versioning - In AI/ML projects, you need to version data, code and model. It’s way more complex than just versioning code in typical software development. You need to version data preparation pipelines, feature store, datasets, and metadata. As part of the modeling phase, you need to version ML model training pipeline, ML model objects, hyperparameters and experiment tracking. On the ML code side, you need to version code and configurations.

Establish Model Monitoring - Once the AI/ML model has been deployed; it needs to be monitored to assure that it performs as expected. This is very complex because model behavior is dependent on so many factors, including data changes, changes in source systems, and upgrade in other dependencies. For example, monitoring model drift, which is to say degradation of predictive quality of the model on served data, fits into this category. Both dramatic and slow-leak regression in prediction quality must be monitored and then rectified. It is important to identify the elements for monitoring and then create an actionable strategy for model monitoring before deploying the model to production. Continuous monitoring alleviates common production concerns including:

Models are in production, but no monitoring has ever been performed
Models are deployed across the organization and in various systems without a consistent way to monitor them
Models have been in production for a long time and never refreshed
Model performance must be determined with a manual process performed by a data scientist
There is centralized way to view model performance across the entire organization or to offload accountability to ops team

5. Focus on measuring model fairness, trust, interpretability and explainability

Businesses need time-consuming and costly audit processes to ensure compliance because of varied deployment processes, modeling languages and the lack of a centralized view of AI in production across organization. Instituting model governance helps with production access control, traceable model results.

Model fairness, interpretability and explainability are critical, not only for the business to understand, but also for data scientists, researchers, and developers as well. This way, we can explain the models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them.

Looking Ahead

AI/ML is not hard. What is painfully hard is implementing it and deploying it at scale in production in a methodical and disciplined manner following sound practices of software engineering, data engineering and data integration, AI/ML model SDLC, DevOps and MLOps while ensuring data privacy, security, compliance and model explainability.

This requires skillsets and expertise across a variety of functional and technical areas beyond just having AI/ML engineers, as depicted in this article.

Data Science is a true team sport.

While cloud computing, distributed computing, and availability of open source technologies, libraries and pre-trained models have certainly reduced the barrier to AI/ML entrance, businesses have painfully realized over the years that AI/ML is not just another technology that they can buy off the shelf, install it and start using it for immediate value creation. Instead, businesses have realized that AI/ML requires continuing executive level support, necessary funding and resources tied to business, product and data strategy.

Businesses are also starting to realize that they need to entertain failure, and encourage data science team to be bold, and use failures as learning opportunities so the next time the team can do better and be successful. This is a different mindset than what is normally required for developing enterprise application and platform.

The practical tips laid out in this article as best practices will benefit you and your team as you start thinking about instituting AI/ML for value creation at scale by bringing your models from labs to the wild.

As AI practitioner, that should be the goal, not just completing proof-of-concepts.

Rachel Newsome

Independent Consultant | CCRP, RWD Operations Expert

2 年

Thanks for this, Santi! From my perspective (which is the operations piece of the at-scale integration), items 4 and 5 are critical to making any AI/ML data usable downstream. The appropriate training and tracking of the accuracy and fair representation of the data is a key component to the future of AI/ML. There have been many examples of at-scale algorithm deployments that have introduced systemic bias because the data to run them originally was biased or because the CT, CM, and QA metrics/review were not put at the forefront of the planning and execution.

1 次回应

Padmaja Surendranath

Advisor (Data, Analytics, GenAI) | Board Member | Ethical & Responsible AI | Diversity | Curious about Metaverses (AI+XR+Web3)

2 年

Very insightful post, Prakriteswar Santikary, PhD Santi!!

1 次回应

查看更多评论

要查看或添加评论，请登录

Prakriteswar Santikary, PhD的更多文章

LLMOps - Taking LLMs to Production at Scale in HealthCare Industry

2025年2月24日

LLMOps - Taking LLMs to Production at Scale in HealthCare Industry

What is LLMOps ? Large language models (LLMs) are revolutionizing numerous industries including healthcare, but their…
AI Agent in Healthcare - a primer

2025年2月17日

AI Agent in Healthcare - a primer

AI agents are all the hype in the world of artificial intelligence, and rightly so. These tools — that can “make…

1 条评论
AI Agents and AI Bots - what's the difference ?

2025年1月1日

AI Agents and AI Bots - what's the difference ?

AI is moving fast and gaining rapid enterprise adoption across all industries. With this rapid adoption comes confusion…

8 条评论
Generative AI and Predictive AI - a quick summary of new trends

2023年6月30日

Generative AI and Predictive AI - a quick summary of new trends

Introduction Generative AI and Predictive AI are two different types of artificial intelligence techniques with…

3 条评论
Low Code and No Code Platforms - some practical thoughts

2023年6月1日

Low Code and No Code Platforms - some practical thoughts

Introduction Low-code is a software development approach that helps tech and business professionals collaborate and…

3 条评论
Robotic Process Automation and Intelligent Automation - a primer

2023年5月10日

Robotic Process Automation and Intelligent Automation - a primer

Introduction Robotic process automation (RPA) generally focuses on automating repetitive, frequently rule-based…

3 条评论
Generative AI – A Primer for any Tech Executive: A clinical research and healthcare perspective

2023年2月28日

Generative AI – A Primer for any Tech Executive: A clinical research and healthcare perspective

Introduction In an increasingly digital world, conversational AI technology has become an important tool for enhancing…
Data Mesh, Data Fabric and Data Lake Architecture - driving data products innovation in clinical research

2021年12月12日

Data Mesh, Data Fabric and Data Lake Architecture - driving data products innovation in clinical research

Executive Summary: In this world of big data and digital economy, every company is a data company, hence every company…

6 条评论
Data Lake + Data Warehouse = Data-Lake-House - a new data architecture paradigm

2021年10月12日

Data Lake + Data Warehouse = Data-Lake-House - a new data architecture paradigm

Both data warehouse and data lake "data management" architectures and enabling technologies have their own identities…

4 条评论
Modern Data Architecture and the rise of Modern Data Platform - a new frontier of innovation.

2018年4月14日

Modern Data Architecture and the rise of Modern Data Platform - a new frontier of innovation.

Please don't forget to have a look at my recent articles on Artificial Intelligence, and Onion architecture. These…

3 条评论

See all articles

AI/ML at Scale in Production- Common pitfalls and how best to avoid them

Prakriteswar Santikary, PhD

EVP, Chief Technology Officer, HelixVM

Executive Summary:

Nearly 85% of the AI/ML models never make it to production

5 Practical thoughts from the trenches of innovation as to how to tackle and overcome these complexities

领英推荐

Looking Ahead

Prakriteswar Santikary, PhD的更多文章

社区洞察

其他会员也浏览了

Turning Dormant Assets into Gold: How Cognitive Bias Can Give You a Competitive Advantage

?? Why AI Fails Without a Strong Data Culture – And How to Fix It

Achieving the Self-Thinking Business

Building a Business to survive and thrive in the world of AI - Seven Habits of Highly Effective Data, Analytics and AI Leaders

Unlocking Generative AI Success: Strategies to Overcome Common Challenges and Maximise Impact

How far will the flywheel of data and AI carry us?

Digging into the Roots: Lessons from Sapiens for AI and Automation in Business

AI Transforming Enterprise Solutions with Local LLMs – Episode #22

#128 Embracing the Future: The Rise of Vectorized Data Pipelines

Embracing the Future with AI-Driven Decision Making

Executive Summary:

Nearly 85% of the AI/ML models never make it to production

5 Practical thoughts from the trenches of innovation as to how to tackle and overcome these complexities

领英推荐

Looking Ahead

Prakriteswar Santikary, PhD的更多文章

LLMOps - Taking LLMs to Production at Scale in HealthCare Industry

AI Agent in Healthcare - a primer

AI Agents and AI Bots - what's the difference ?

Generative AI and Predictive AI - a quick summary of new trends

Low Code and No Code Platforms - some practical thoughts

Robotic Process Automation and Intelligent Automation - a primer

Generative AI – A Primer for any Tech Executive: A clinical research and healthcare perspective

Data Mesh, Data Fabric and Data Lake Architecture - driving data products innovation in clinical research

Data Lake + Data Warehouse = Data-Lake-House - a new data architecture paradigm

Modern Data Architecture and the rise of Modern Data Platform - a new frontier of innovation.

社区洞察

其他会员也浏览了

Turning Dormant Assets into Gold: How Cognitive Bias Can Give You a Competitive Advantage

?? Why AI Fails Without a Strong Data Culture – And How to Fix It

Achieving the Self-Thinking Business

Building a Business to survive and thrive in the world of AI - Seven Habits of Highly Effective Data, Analytics and AI Leaders

Unlocking Generative AI Success: Strategies to Overcome Common Challenges and Maximise Impact

How far will the flywheel of data and AI carry us?

Digging into the Roots: Lessons from Sapiens for AI and Automation in Business

AI Transforming Enterprise Solutions with Local LLMs – Episode #22

#128 Embracing the Future: The Rise of Vectorized Data Pipelines

Embracing the Future with AI-Driven Decision Making