Machine learning operations (MLOps) is the development and use of machine learning models by development operations (DevOps) teams. MLOps adds discipline to the development and deployment of machine learning models, making the development process more reliable and productive.
MLOps encompasses a set of processes that machine learning developers use to build, deploy, and continuously monitor and train their models. It's at the heart of machine learning engineering, and it blends artificial intelligence (AI) and machine learning techniques with DevOps and data engineering practices.
There are many steps needed before an ML model is ready for production, and several players are involved. The MLOps development philosophy is relevant to IT pros who develop ML models, deploy the models and manage the infrastructure that supports them. Producing iterations of ML models requires collaboration and skill sets from multiple IT groups, such as data science teams, software engineers and ML engineers.
Development of deep learning and other ML models is considered inherently experimental, and failures are often part of the process in real-world use cases. The discipline is still evolving, and it's understood that sometimes even a successful ML model might not function the same way from one day to the next.
MLOps provides a range of benefits, such as the following:
- Speed and efficiency. MLOps automates many of the repetitive tasks in ML development and within the ML pipeline, such as the initial data preparation procedures. This approach reduces development time and cuts down on human-induced errors in the models.
- Scalability. ML models often must be scaled to handle increased workloads, larger data sets and new features. To provide scalability, MLOps uses technology such as containerized software and data pipelines that can handle large amounts of data efficiently.
- Reliability. MLOps model testing and validation fix problems in the development phase, increasing reliability early on. Operations processes also ensure models comply with policies that an organization has in place. This reduces risks such as data drift, in which the accuracy of a model deteriorates over time because the data it was trained on has changed significantly.
MLOps might be far more streamlined and efficient than traditional approaches, but it's not without its challenges. They include the following:
- Staffing. The same data scientists responsible for developing ML algorithms might not be the most effective at deploying them. They also might not be best equipped to explain how to use the algorithms to software developers. Some of the best MLOps teams embrace the idea of cognitive diversity -- the inclusion of people who have different approaches to problem-solving and offer unique perspectives because they think differently.
- Costliness. MLOps can be costly, given the need to build an infrastructure that encompasses many new tools and the resources required for data analysis as well as model and employee training. This is especially true of large-scale machine learning projects with lots of dependencies and feedback loops. It is important for an organization interested in these projects to assess whether MLOps is the best approach.
- Imperfect processes. While MLOps processes are designed to reduce errors, some mistakes still occur and require human intervention.
- Cyber attacks. Malicious actors are a threat given the large amount of data that MLOps infrastructures store and process. Cybersecurity is required to minimize the risk of data breaches or leaks.