I am currently in the process of crafting a Medium blog post on FMOps/LLMOps, and the more I delve into research and engage with customers, the challenge becomes apparent in finding the right name for this emerging field within artificial intelligence.
While an empirical perspective might suggest that MLOps is a subset of DevOps and, consequently, FMOps is part of MLOps, I find myself in disagreement. The distinctive nature of Foundation Models, especially LLMs, introduces novel practices and requirements such as prompt engineering, RAG, or guardrails. Strictly speaking, MLOps may not be an absolute necessity for scaling the production of GenAI applications, and leveraging Foundation Models via APIs and innovative services like Amazon Bedrock is a viable route. Nevertheless, the ones with solid MLOps foundations, will have a more enjoyable journey.
As I continue to ponder the most fitting name for this domain, I'm still not sold eithet on LLMOps or FMOps. I believe the crux lies in identifying the motivations and drivers behind the need for this set of best practices, particularly when dealing with foundation models without an established MLOps culture.
In my view, understanding these drivers is crucial, as they should address some of the most common challenges encountered when working with foundation models. I aim to introduce what I consider the most significant drivers and gather feedback and consensus, at least on the foundational principles. I'd appreciate your thoughts on this journey.
- Exponential growth of LLMs and quick innovation pace. When somebody has to buy a new car, usually he or she researches different providers, models and features based on the needs. Then, get pre-approved for a loan and test-drive the car. if you are happy with it, check the sale price and warranties before closing the deal. Similar it happens to LLMs, you just don’t go use most popular one. You will need to test and evaluate different metrics and establish trade-offs for your use cases. FMOps will help you designing the framework that guides you through the best FM for your specific use case and most important, what to do when the model gets outdated.
- Reliability & Performance: the primary function of an LLM is to generate informative content for users in a controlled environment. Therefore, it is crucial to align the model so that it generates reliable outputs. That means LLMs and the underlying infrastructure need to be tested before they get promoted into production. Reliability is a foundational requirement because unreliable outputs would negatively impact almost all LLM applications, specially the ones used in direct customer facing scenarios. This is important to avoid spreading misinformation and build user trust. Some of the most important categories for evaluating and aligning LLM reliability are misinformation, miss calibration, inconsistency that for the shake of simplicity, we’ll group under the concept of hallucinations. Hallucination is when an LLM generates content that is nonsensical or unfaithful to the provided source content with appeared great confidence. This is still an ongoing area where the exact cause of it is still unclear and limited number of methods are proposed. FMOps will also continuously improve model performance, leading to more accurate and contextually relevant and high-quality responses.
- Cost:?it is essential to note that cost analysis for LLMs can quickly become outdated due to the rapid evolution of the field. Nevertheless, one important aspect of FMOps is to ensure your GenAI proof of concept is still valid when deploying it but even most important, affordable at scale in a real production environment when thousand of users will call your service. Contrary to popular belief,?bigger doesn’t always mean better when it comes to LLMs. Smaller models can be just as effective, if not more so, when it comes to specific tasks. First and foremost,?smaller models are often more cost-effective to train and deploy. In addition, by leveraging pre-trained models as a starting point and fine-tuning them on task-specific data, you can accelerate the training process and achieve?good performance with fewer resources, therefore less expensive. Finally, while managed APIs can be a convenient option for rapid prototyping or small-scale projects, it’s crucial to?consider the long-term costs and evaluate whether it makes financial sense?to rely on them for large-scale production deployments. these services often have usage-based pricing models, meaning that the more you rely on them, the higher your expenses will be. In some cases, building or fine-tuning your own LLM may be a more cost-effective alternative.
- Real-time interaction (Latency):?In today’s fast-paced world, latency plays a crucial role in delivering a seamless user experience. Whether it’s a chatbot, a language translation service, or a recommendation system, users expect real-time or near-real-time responses. Therefore, optimising latency becomes paramount when deploying foundation models in production. To achieve low latency,?several factors come into play, including choice of LLM API or hardware infrastructure in the case of self-hosted Open Source LLMs, input and output length, efficient memory usage, and optimized algorithms
- Automation & Scalability:?Organisations will seek ways to efficiently fine tune, deploy, monitor, and manage a growing number of “small” LLMs. Automatic workflows will be needed to fine tune and deploy these models at scale, specially when chaining multiple foundation models. On the other hand, FMOps will provide tools and frameworks to support multiple data science teams to collaborate and productionize tens or hundreds of LLMs. Additionally, the automation to instantiate a secure FMOps environment to enable multiple teams to operate their models will reduce the dependency and overhead to IT.
- Responsible AI: the development of generative AI applications represents a significant technological advancement. However, along with this progress comes the responsibility to ensure that the use of such technology is ethically sound and respects the rights and values of all stakeholders. By understanding and addressing the ethical considerations involved, we can harness the full potential of AI while minimizing the risks and harm associated with it.
- Legal aspects: when training, fine-tuning or deploying LLMs, you need to make sure not to run into any copyright or protection issues surrounding your data and model. From a legal perspective, you need to be aware of where your model and your data are hosted in order to comply with data and privacy protection regulations. On the other hand, as LLMs become more sophisticated and capable, they are increasingly being used to create creative works, such as articles, books, and even music and art. However, the intellectual property rights (IPR) of LLM-generated works is a complex and evolving legal question
- Standardisation: we can summarise all the above into how FMOps will help users to standardise the lifecycle of foundation models. Which foundation model do we have to use for a specific task vs others, when to implement RAG techniques to improve reliability and reduce hallucinations or decide between fine tuning to maximise the accuracy for specific tasks vs prompting are examples of some of the most common questions users are asking at the moment when working with generative AI. Most of the times the approach is purely intuitive and based in trial and failure. This may work for a few number of use cases but requires a standard and well-defined LLM lifecycle in order to industrialise GenAI applications.
Head - Data Engineering, Quality, Operations, and Knowledge
1 年Hype driven by even more hype Once all these companies are asked to pay up for proving they are not violating copyrights and not taking content in public to be available reality will strike. This is what Spanish and British did once upon a time loot and kill to profit. Eventually people realised taking away things without permission just because you can is wrong and unethical
Exciting journey ahead! Looking forward to your insights. ??
CTO | IT Consultant | Co-Founder at Gart Solutions | DevOps, Cloud & Digital Transformation
1 年I'm intrigued by your research on FMOps/LLMOps and the challenges of finding the right name for this emerging field. Looking forward to reading your blog post! ????
Principal AI & MLOps Engineer @ Barclays | Author | Visiting Lecturer @ Oxford, Warsaw
1 年I call it all “XOps” and just crack on!
AI Strategy / AI Maturity / Enterprise AI Adoption /Amazonian / Ex-IBM Executive / Ex-Microsoft
1 年Thought provoking. Agree that FMOps is not a subset of MLOps. Technically speaking, leveraging an LLM though an API does not have all the specific requirements of training and deploying an ML model, but there will be common elements more at the higher level in terms of the integration into business processes, security, risk management, etc. and as you say this will be more effectively done by organizations with solid MLOps practices. I am also curious as to how the MLOps field gets updated for the purpose of training and deploying FMs, instead of just using them. Any thoughts there?