Building AI models with open source LLMs
Justo Hidalgo
Chief AI Officer at Adigital. Highly interested in Responsible AI and Behavioral Psychology. PhD in Computer Science. Book author, working on my fourth one!
How Companies Can Leverage Existing Base Models to Build Improved AI Systems
At Adigital , we recently had the privilege of hosting an enlightening Adigital Academy session led by Daniel Vila Suero , the CEO of Argilla . The session was an eye-opener, providing deep insights into the world of LLM model development. Recognizing the value of this knowledge, I felt compelled to share these learnings in a relatively accessible format, especially for those who are not deeply technical but are keen to understand the nuances of building AI models in this new world of Generative AI. I hope I didn't miss anything critical ;)
Daniel Vila, with his expertise, guided us through the intricacies of leveraging existing AI models to create more refined, tailored solutions. This post aims to distill Daniel’s knowledge.
Companies are starting to seek ways to improve their business and competitiveness via new AI capabilities. One effective approach is leveraging existing base models, such as OpenAI's GPT-4 or Zephyr, to develop tailored solutions that meet specific business needs. Here's a detailed guide on how a company can use these models to build their own improved AI systems.
Step 1: Data Collection
The initial and crucial step involves gathering a dataset. A dataset for generative AI is basically a set of questions and answers. This could range from a thousand examples to whatever is available. For customer service applications, companies often already have an extensive collection of user queries, providing a solid foundation. If the aim is to develop an assistant in a field where user questions are not yet collected, synthetic generation techniques can be employed. Tools like Notus (Argilla’s own LLM, just launched) or GPT-4 can be used to create questions based on relevant texts you may want to start from, breaking them into sentences or paragraphs and generating potential queries for each.
Step 2: Initial Validation
Once a dataset is created, it should be validated. Human experts can use tools like Argilla to ensure that the generated questions are appropriate and contextually relevant. For instance, in a regulatory use case, questions might include, "What does section 2 of the regulation state?" or "How are application criticality levels defined?" Experts can rapidly validate hundreds of such questions.
Step 3: Model Selection and Testing
After validation, select one or two available models (like Zephyr or OpenAI) and pass the questions to see if the responses are accurate, incorrect, or irrelevant. Another round of human validation is crucial here, where experts evaluate the responses, providing a preliminary assessment of the model's effectiveness.
Step 4: Supervised Fine Tuning (SFT)
If the base model doesn't meet expectations, or if there's a preference for open-source models, a larger dataset (around 5,000-10,000 question-answer pairs) is necessary for Supervised Fine Tuning.
Supervised Fine Tuning (SFT) is a technique used in refining AI models, particularly Large Language Models (LLMs). It involves training the AI model on a specific, labeled dataset to adapt it to particular tasks or improve its performance in certain areas. This process is 'supervised' because it uses a dataset where the desired outputs (like correct answers to questions) are already known and labeled. By training the model on this dataset, the model learns to generate more accurate and relevant responses based on the specific requirements of the task. As Daniel explained, SFT is a crucial step in customizing base AI models to suit specific applications, ensuring that they respond more accurately to the kind of inputs they will encounter in their designated use case.
In this example, the process involves adjusting a model like Zephyr to better suit specific needs, without starting from scratch with a base model like Mistral.
Step 5: Post-SFT Evaluation
After Supervised Fine Tuning (SFT), the next crucial phase is evaluating the AI model's response accuracy. If the model's performance aligns with expectations, no further adjustments are needed, we have our model!!! Cheers.
However, if there's potential for improvement, this is where Direct Preference Optimization (DPO) becomes pivotal.
DPO, a novel technique akin to Reinforcement Learning from Human Feedback (RLHF), marks a significant leap in Large Language Model (LLM) refinement. Unlike RLHF, which had a multifaceted objective and faced challenges integrating Reinforcement Learning (RL) intricacies into Natural Language Processing (NLP), DPO offers a more direct approach. Its primary optimization relies on binary cross-entropy loss, simplifying the LLM refinement process significantly.
Wait, what??????
Sorry, my geeky face showed up unexpectedly. Let me start again.
领英推荐
Direct Preference Optimization (DPO) simplifies improving AI language models by focusing on a straightforward method that determines how close the AI's answers are to preferred human responses, making the process of making AI smarter and easier to manage.
.
In practice, DPO involves creating a dataset of preferences with questions and multiple answers. Each answer is then evaluated, not just for correctness, but also for its alignment with direct human preferences. This process not only refines the model's accuracy but also ensures it aligns more closely with human judgment and nuances in responses.
Expert note 1: Dealing with GPU Costs and Availability
A significant consideration in AI development is the cost and availability of GPUs, especially for smaller ventures. For context, transitioning from a base model like Mistral to an SFT and then a DPO model (like Zephyr) can cost around 500€ with up to 8 hours of training time, significantly less than training a base model from scratch. However, availability of GPUs can be a challenge, with shortages often reported in major cloud services like Google Cloud or Amazon.
Expert note 2: Incremental Improvement and Internal Resources
Start with smaller datasets and scale gradually. Many large companies start with their in-house AI development teams initiating these tests and trials. This approach mitigates the need for external expertise and costs in creating a perfect dataset. Moreover, the end-user teams, such as those operating customer support chatbots, can play a pivotal role in continually refining the dataset.
Expert note 3: Retrieval Augmented Generation
During the course there were some questions about how this approach compared with another technique called Retrieval Augmented Generation (RAG). RAG is a distinct approach compared to the methods like Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO) used in refining AI models. While SFT and DPO focus on improving a model's responses based on a fixed dataset and human preferences, RAG introduces an additional step where the AI model dynamically retrieves information from a large database or document set to enhance its answers. Essentially, RAG combines the generation capabilities of models like GPT-4 with a retrieval component, allowing the AI to pull in external information to provide more informed, accurate, and contextually relevant responses. This makes RAG particularly useful for tasks requiring up-to-date or specific information that may not be contained within the training data of the model. Please check this previous article of mine where I showed some code to build a simple RAG application.
So this is all! Building a custom AI model on top of existing base models involves a systematic approach starting from data collection, through validation and model testing, to fine-tuning and final evaluations. The process is cost-effective, especially when leveraging open-source models, and allows for gradual improvement, harnessing internal resources.
I'd like to extend a heartfelt thank you to Daniel Vila for his contribution to our Adigital Academy. For those interested in delving deeper, particularly into how Argilla built their Large Language Model, Notus, I highly recommend visiting their detailed blog post.
We at Adigital will continue with introductory and detailed talks about artificial intelligence and other emerging trends in 2024. Consider joining us to obtain access to this knowledge!!!
ML & DevRel @ Hugging Face ?? || ?????? Cooking, ?????? Coding, ?? Committing
1 年Interesting article! Tom Aarsen also wrote an interesting tutorial about supervised fine-tuning Mistral AI for chat completion, which might be interesting for some readers. https://docs.argilla.io/en/latest/tutorials_and_integrations/tutorials/feedback/training-llm-mistral-sft.html