登录查看更多内容

Taking AI/ML Ideas to Production

CloudRaft

We help companies grow by leveraging AI and cloud-native technologies.

发布日期: 2023年3月1日

The integration of AI and ML in products has become a trend in recent years. Companies are trying to incorporate these technologies into their products to improve their efficiency and performance.?And this year particularly, with the boom of ChatGPT, almost every company is trying to introduce a feature in this domain. One of the main benefits of AI and ML is their ability to learn and adapt. They can analyze data and use it to improve their performance over time. This means that products that incorporate these technologies can become smarter and more efficient over time.

Let's understand now, how companies are taking their ideas to production. Usually, they start with hiring a few data scientists who will figure out what models to create to solve the problem, fine-tune them, and handover to MLOps or DevOps engineers to deploy. Your DevOps engineers may or may not know, how to efficiently take these models to production. That's where you need specialized skills such as Machine learning engineers and MLOps who understand how to manage the whole process of CI/CD/CT pipeline efficiently.

Maturity of Deployment Strategy

Many engineers will start with packaging the model and APIs in a popular python framework like Flask or FastAPI, in a container and deploy on Docker or Kubernetes. This works well for the lab type of environments but is not really meant for production use cases.

More mature companies come up with their own tooling to orchestrate and deploy the service. I think they are well set but their system is not aligned with the ecosystem and requires a lot of effort in managing and maintaining the system over time.

Lastly, you are at the rightmost side where you deploy a specialized machine learning platform such as Kubeflow, Ray, ClearML, etc. which provides end-to-end tools to manage the lifecycle of ML service.

No alt text provided for this image — Credits: Model Serving at the Edge Made Easier - Paul Van Eck & Animesh Singh, IBM, https://www.youtube.com/watch?v=0BlK7PaLCFM

Where to Start?

If you are new to MLOps, it is not so easy to understand what exactly you need in your stack. To simplify this, MLOps community has shared a template that can help you to do some self-assessment, and navigate the large AI/ML platform ecosystem.

Not all the components are required in the stack but you can put your requirements for each component and identify the tool that works for you.

Essentially, you need a way to bring data to the platform with version control, run and record experimentations, an ML pipeline to automatically run the code that your data scientist is framed in the notebook, and a model registry where you will store the models and their lineage, followed by model serving and monitoring the performance of the inference.

Finding the Right Tool for the Job

As I stated earlier, the ecosystem is thriving and there are hundreds of tools and frameworks coming up to manage a subset or the full lifecycle.

领英推荐

Docker Labs: GenAI No. 9

Docker, Inc 5 个月前

Docker Labs: GenAI | No. 1

Docker, Inc 8 个月前

Empowering Software Development with AI: An Overview…

AdvanceWorks 9 个月前

Neptune.ai has compiled the above stack which I find quite useful to understand how these offerings are fitting together.

Here are some choices that you can explore.

Cloud offering: If you are in any major public cloud, you will get access to the services like Vertex AI in Google Cloud, Sagemaker in AWS, and Azure ML in Microsoft Azure. They have done a pretty good job making useful end-to-end lifecycle management. Some of them are more mature than others obviously and some of them are not very cost-effective. For example, I don't find their model serving options not very efficient. They don't allow you to use fractional GPUs or run a copy of a full GPU machine during deployment rollout.
Build your own stack on Kubernetes: If you are brave and want to take some open-source framework like Kubeflow or Ray, you can create your own stack. These frameworks are mostly complete when combined with MLFlow.
Commercial Products: Lastly, there are players like truefoundry, weights and biases, ClearML, etc that provide the full solution with minimal operational overhead. ClearML is also available as an open-source self-hosted version but then you back to point 2, building your own stack.

There are many more tools and products available, but I don't want to focus on them instead I want to give a rough sketch of the stack.

Whatever you select, the de facto industry standard to host these MLOps stacks is Kubernetes, sometimes cloud provides management for you to simplify the operations and sometimes leaves it to you to run on your clusters.

Challenges or Mistakes

Most often, we overlook data privacy and security and only realize this when we are hacked or need to get some compliance.
Lack of skilled resources who understand the ecosystem well.
Sparsely spread tooling which is not production ready. For example, I have found many popular tools in the MLOps stack lacking RBAC and authentication.
A wrong or lack of understanding of the right stack leads to inefficient processes.
Inefficient use of computing such as GPU can blow your cost. We often overlook cost initially and realize only later when we are hit by a bill shock.
Inefficient underlying platform - this could be your Kubernetes cluster which is not efficiently handling computing, has unreliable cluster design, etc.

Need Expert Help?

I am sure you might have been overwhelmed by the CNCF landscape, and how complex it has become over time. You need to hire a consultant like us to figure out what is better for you. Here is another one: the AI landscape by the Linux Foundation to confuse you further. Don't worry, we will help you.

If you are stuck finding a proper MLOps stack, book some time to chat about your problem, we will be more than happy to help you.

Similar to DevOps, there is no single best solution immediately available, you need to work for your team and build an MLOps practice that works for your company and team.

In the subsequent issues, we will share more about our MLOps journey and our opinions about the popular tools. Subscribe to the newsletter to keep up to date.

We at CloudRaft help businesses grow and solve complex problems by leveraging cloud-native technologies and modern platform engineering practices. We are building the MLOps stack for our clients and learning about the evolving ecosystem. Do ping us ([email protected]) if you need help in these areas.

If you like this article, don't forget to share it with your friends and colleagues.

Taking AI/ML Ideas to Production

CloudRaft

We help companies grow by leveraging AI and cloud-native technologies.

Maturity of Deployment Strategy

Where to Start?

Finding the Right Tool for the Job

领英推荐

Challenges or Mistakes

Need Expert Help?

Cloudraft's Newsletter

1,246 位关注者

CloudRaft的更多文章

社区洞察

其他会员也浏览了

Understanding the Benefits of MLOps for AI Development

AI in 2025: Trends Every Developer Must Watch

Insight of the Week: Generative AI is not a Product

The Impact of Generative AI on Software Development

Intelligent Development

Embracing the Future: How Generative AI is Transforming Application Development

Interesting Ways AI Impacts Software Development and Testing

Streamlining Machine Learning Lifecycles: The Role of MLOps

2025 Outlook: Understanding Modernization Strategy Using Generative AI-Assisted Intelligence

Top AI and Machine Learning Trends Transforming Software Development in 2024

Maturity of Deployment Strategy

Where to Start?

Finding the Right Tool for the Job

领英推荐

Challenges or Mistakes

Need Expert Help?

Cloudraft's Newsletter

1,246 位关注者

CloudRaft的更多文章

January Newsletter - Observability Special

December Newsletter

K3s vs Talos Linux

Building AI Cloud for India

Cloud Native Digest with Advanced Observability Techniques and DevSecOps Insights ??

Cloud Native Highlights: AI, Security, Observability, and more!

CloudRaft's Newsletter | June Edition

CloudRaft's Newsletter

AI and Cloud Native Updates

社区洞察

其他会员也浏览了

Understanding the Benefits of MLOps for AI Development

AI in 2025: Trends Every Developer Must Watch

Insight of the Week: Generative AI is not a Product

The Impact of Generative AI on Software Development

Intelligent Development

Embracing the Future: How Generative AI is Transforming Application Development

Interesting Ways AI Impacts Software Development and Testing

Streamlining Machine Learning Lifecycles: The Role of MLOps

2025 Outlook: Understanding Modernization Strategy Using Generative AI-Assisted Intelligence

Top AI and Machine Learning Trends Transforming Software Development in 2024