登录查看更多内容

Overview of ML Ops: A must-have

Yash Karwa

Data Analytics & Science @ HP | Driving Customer Insight

发布日期: 2020年4月10日

Alright, Let me turn from AI/ML in business one of my previous blogs to showing ML Ops value and running ML in production and scalable. Also, I will go to the skill set required for Data Science in Operation, which I have mentioned in my earlier post.

You may have heard of the new word - ML Ops; Machine Learning in Operations. It is a merge of three distinct areas: DevOps, Software engineering, and Machine Learning.

Again, I don't want to throw another buzz word to spice things up.

In my experience, I have seen it is easy to build models, but it takes a long time to move to production or never moved at all.

Here are some of my old notes:

One of the models was not working for more than six months, and no-one knows how much value was lost!

Our previous Data scientist has left the team, and we don't know what has changed in data/infrastructure - predictions are wrong!

You might say really - Are you kidding !? I am afraid, and I am not. Here is the old one from twitter.

Software engineering vs ML Life cycle

Now the first question you might have - How is related to the Software engineering Lifecycle. Let us compare the high-level of Software engineering and data science lifecycle. I am not going in-depth. But, It will give a decent overview.

The software engineering Life cycle is a known well-defined process in the industries and the teams across the industries following these processes from the long-time.

On the other hand, Data Science is still evolving outside of heavy tech industries, and as shown, there are a lot of moving pieces.

The image above can be more pretty, but I hope you got my point :)

So, to summarize, MLOps helps in:

The model in production must be reproducible.
The model provides explainability, accountable & auditable
Easily collaborate with the team member
Streamline the process with continuous deployments.

Data Science / ML != Data Science/ML + Software Engineering + DevOps

Now, the second question you may have - To be a productive Data Scientist - Do I need to learn all the skills of DevOps and Software Engineering.

The answer is no for both cases:

a. If your organization/ team has a well established Data Science team. You may already have it. Look around !?

b. If your organization is new in Data Science or still adapting, then yes, you need to learn some of the skills. Show the value and become case a. (Hard job to do, I know)

I will say that again, you need to be sufficient to show the value of Data Science / ML. You did not need to be perfect in DevOps or Software engineering.

You should possess enough knowledge to work with Engineering to move your code to the production. Here is the sweet spot, believe me, need to upscale every other day.

Operation Skills for Data Scientist:

To conclude this article. Here are the set of skills you need to be effective:

Development (Data Scientist)

Git: Version Control
Docker: Container, think like a frozen copy of the environment.
Airflow/Luigi - Any ETL tool your Data Engineering team is using.

Continuous Integration (Data Scientist)

Jenkins/ CircleCi / Gitlab: Any tool Engineering are using - helps code changes in git to a CI job which runs against versioned data, build a version model, create metrics, and pushes the model to a repository.

Continuous Delivery (Data Scientist partner with DevOps)

Kubernetes: Deploy ML models typically done by building container images. Docker images moved to a docker registry, and these images can be pulled from the docker registry to run in Kubernetes to take Kubernetes's advantage.

Monitoring in Production (Data Scientist partner with DevOps)

Prometheus - Monitoring tool
Grafana - Dashboard for monitoring, coupled with Prometheus.

Good-to-have

MLflow / Kubeflow - Machine Learning pipeline

In my upcoming articles - I will dig deep or provide links into some of the skills mentioned above. Stay tuned!

Overview of ML Ops: A must-have

Yash Karwa

Data Analytics & Science @ HP | Driving Customer Insight

Software engineering vs ML Life cycle

Data Science / ML != Data Science/ML + Software Engineering + DevOps

Operation Skills for Data Scientist:

更多精彩文章

社区洞察

其他会员也浏览了

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

LLMOps Series: Workflow Orchestration Tools for LLMOps Pipelines

Issue #167 - THE ML ENGINEER ??

Issue #163 - THE ML ENGINEER ??

Beyond Algorithms: The Essential Skills for Thriving as a Machine Learning Engineer

Should You Care About MLOps? Why and How Much? (ML4Devs Newsletter, Issue 12)

Issue #6: Marvelous MLOps

THE PERFECT MLOPS TEAM: HOW TO CREATE AND MAINTAIN A SUCCESSFUL IMPLEMENTATION

Understanding MLOps in IT Engagements: Insights from Leading Data Scientists

MLOps: Enabling Rapid Feature Engineering and Model Deployment in Enterprises

Software engineering vs ML Life cycle

Data Science / ML != Data Science/ML + Software Engineering + DevOps

Operation Skills for Data Scientist:

AI/ML: A Starter Guide for Business

2020年4月8日

Data Science Skills in Real World

2020年4月6日

Data Science Lifecycle

2019年11月5日

Smartsheet Integration BigQuery

2019年11月4日

社区洞察

其他会员也浏览了

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

LLMOps Series: Workflow Orchestration Tools for LLMOps Pipelines

Issue #167 - THE ML ENGINEER ??

Issue #163 - THE ML ENGINEER ??

Beyond Algorithms: The Essential Skills for Thriving as a Machine Learning Engineer

Should You Care About MLOps? Why and How Much? (ML4Devs Newsletter, Issue 12)

Issue #6: Marvelous MLOps

THE PERFECT MLOPS TEAM: HOW TO CREATE AND MAINTAIN A SUCCESSFUL IMPLEMENTATION

Understanding MLOps in IT Engagements: Insights from Leading Data Scientists

MLOps: Enabling Rapid Feature Engineering and Model Deployment in Enterprises