Overview of ML Ops: A must-have
Alright, Let me turn from AI/ML in business one of my previous blogs to showing ML Ops value and running ML in production and scalable. Also, I will go to the skill set required for Data Science in Operation, which I have mentioned in my earlier post.
You may have heard of the new word - ML Ops; Machine Learning in Operations. It is a merge of three distinct areas: DevOps, Software engineering, and Machine Learning.
Again, I don't want to throw another buzz word to spice things up.
In my experience, I have seen it is easy to build models, but it takes a long time to move to production or never moved at all.
Here are some of my old notes:
One of the models was not working for more than six months, and no-one knows how much value was lost!
Our previous Data scientist has left the team, and we don't know what has changed in data/infrastructure - predictions are wrong!
You might say really - Are you kidding !? I am afraid, and I am not. Here is the old one from twitter.
Software engineering vs ML Life cycle
Now the first question you might have - How is related to the Software engineering Lifecycle. Let us compare the high-level of Software engineering and data science lifecycle. I am not going in-depth. But, It will give a decent overview.
The software engineering Life cycle is a known well-defined process in the industries and the teams across the industries following these processes from the long-time.
On the other hand, Data Science is still evolving outside of heavy tech industries, and as shown, there are a lot of moving pieces.
The image above can be more pretty, but I hope you got my point :)
So, to summarize, MLOps helps in:
- The model in production must be reproducible.
- The model provides explainability, accountable & auditable
- Easily collaborate with the team member
- Streamline the process with continuous deployments.
Data Science / ML != Data Science/ML + Software Engineering + DevOps
Now, the second question you may have - To be a productive Data Scientist - Do I need to learn all the skills of DevOps and Software Engineering.
The answer is no for both cases:
a. If your organization/ team has a well established Data Science team. You may already have it. Look around !?
b. If your organization is new in Data Science or still adapting, then yes, you need to learn some of the skills. Show the value and become case a. (Hard job to do, I know)
I will say that again, you need to be sufficient to show the value of Data Science / ML. You did not need to be perfect in DevOps or Software engineering.
You should possess enough knowledge to work with Engineering to move your code to the production. Here is the sweet spot, believe me, need to upscale every other day.
Operation Skills for Data Scientist:
To conclude this article. Here are the set of skills you need to be effective:
Development (Data Scientist)
- Git: Version Control
- Docker: Container, think like a frozen copy of the environment.
- Airflow/Luigi - Any ETL tool your Data Engineering team is using.
Continuous Integration (Data Scientist)
- Jenkins/ CircleCi / Gitlab: Any tool Engineering are using - helps code changes in git to a CI job which runs against versioned data, build a version model, create metrics, and pushes the model to a repository.
Continuous Delivery (Data Scientist partner with DevOps)
- Kubernetes: Deploy ML models typically done by building container images. Docker images moved to a docker registry, and these images can be pulled from the docker registry to run in Kubernetes to take Kubernetes's advantage.
Monitoring in Production (Data Scientist partner with DevOps)
- Prometheus - Monitoring tool
- Grafana - Dashboard for monitoring, coupled with Prometheus.
Good-to-have
- MLflow / Kubeflow - Machine Learning pipeline
In my upcoming articles - I will dig deep or provide links into some of the skills mentioned above. Stay tuned!