How to train AI on your own data using Fine Tuning and Retrieval-Augmented Generation (RAG).
AI chatbots like ChatGPT are great for answering general questions but may be lacking expertise in a specific area.? In this article, I will share two main ways that you can improve the power of standard AI models with the addition of data supplied by your organization using RAG and fine tuning.
Why add your own data?
Most models today can allow you to submit additional data along with your prompt.? For example, you can submit a contract to an AI chatbot and ask it to summarize the key terms.? However, there are many situations where you don’t want to be sending files one at a time:
To address these issues, there are two common options available to import your data, the first involves parsing your data so it can be included along with your prompts.? This is called Retrieval-Augmented Generation (RAG) and is the simplest of the two processes.? The second option is taking an existing model and then partially retraining the model to access your data in a process called fine tuning.
Retrieval-Augmented Generation (RAG)
Imagine you have a large library of books and want to share a particular passage with a friend.? Instead of telling your friend to go to your library and start reading through each book to look for the passage, you find the book yourself, find the specific page and then share it with your friend.? If the information is in two different books, you may have both books open at a time while you discuss the subject.
RAG accomplishes this task in several steps.? First, your source data is turned into smaller manageable chunks.? These chunks are then converted to a format that can be recognized by the AI called vectors (embedding). The vectors are then loaded into a vector database.? When you create a prompt related to your data, the prompt first searches the database for all vectors that are relevant to the prompt in a process called retrieval.? When the relevant vectors are found, they are sent along with your prompt to the AI model which reads the data and then creates a response.? Below is an example of how your internal data could be used with an AI model using RAG.
When a prompt is made, only a subset of your data is sent to the AI model along with your prompt for processing.? This has the advantage of minimizing the data that has to be analyzed by the AI as well as reducing costs because less information (tokens) need to be processed.
If security is a concern then you can set up this entire system within your own infrastructure using open source models such as Llama 2.
To see a simple example of this approach, I created an app that allows you to query a book from the Gutenberg free library by entering the URL to the book.? The vectors of the book are stored locally and only relevant content is sent to ChatGPT when you ask a question.
One of the biggest advantages of RAG is that it is relatively easy to set up. You can run tests with sample data on your desktop using API calls to the LLM model and then scale up after a proof of concept.? Another advantage of this approach is that you are not tied to a particular LLM that produces the output.? This allows you to maintain some vendor neutrality as well as allow for upgrading when newer models are released.? As you switch to different models, you may need to update your vector database or the embedding method that you use, however these are simple transformations that can be done with a few lines of code in your pipeline and just take time to run depending on how much information you are pulling in.
One disadvantage of RAG is that it depends on the information that is sent with each query.? If there are particular nuances or connections that your database misses then these may not be reflected in your response.? Which leads us to the next method, Fine Tuning.
Fine Tuning
Unlike RAG, which provides additional context along with a query, fine tuning actually changes the model that is being used.? This is done by starting with a fully working model, then updating only a couple of layers in the model to include the additional information.? To learn more about layers in a neural network, see my introductory article on neural networks . ?
A good analogy of this process is changing the tires on a car to suit the weather conditions.? For winter driving, you may want to swap out the standard tires with winter studded tires for grip on the ice.? You are not changing the core functionality of the car, just modifying it for the particular conditions. Below is a model of how the model is trained.
领英推荐
One common use case for Fine Tuning is in image recognition.? There are many models that can tell the difference between common fruit like oranges and apples but this classification is at a high level.? Suppose you wanted to automatically determine the ripeness of tomatoes based on their color and shape.? Below is a tomato ripeness scale from Biometric Central
To fine tune an image recognition model, begin by collecting a large set of tomato images with different ripeness.? Then a human would classify the images based on the above scale using their best judgment.? Next you would get a standard image recognition model, remove some of the layers and begin training the model on your new set of training data.? Once your model is trained, you could show it a picture of a tomato and then the model would automatically determine the ripeness.
Fine tuning can also be done with LLM models but it is a little more complicated.? In general, you can not just train the model on a large dataset.? Instead you have to train the model the way you want to use it.? If you want to create a model that answers users' questions, you would need to train the model on a set of predefined questions and acceptable answers.
Adding and removing layers is one form of fine tuning.? Another form of fine tuning includes Low Rank Adaption (LoRA) where you create additional matrices that run next to your main model.? Parameter Efficient Fine-Tuning (PEFT) allows you to create a layer that sits on top of your based model.?
The advantages of fine tuning is that you get more control over your model and can make it do things that it was not originally intended for.? Additionally, once a model is fine tuned, you can pick it up and deploy it for a use case and do not need to include the original training data. The disadvantage of fine tuning is that it requires some specialized knowledge, formatted datasets and an infrastructure to do the training.? In addition, if you want to use a new model, you need to start the fine tuning process over.
What Method to Use?
If you want to get started quickly working with your own data then RAG is a great place to begin. You can pull your data into a vector database, add some prompt engineering to help format your response and then begin to experiment.? If you have an image recognition task and source images then fine tuning is an excellent option.? Fine tuning an LLM is a little more involved but can help a model “think” the way that you want it to.
One example of putting these two techniques together is Nvidia’s new medical chatbot . The model was probably fine tuned to be able to have general medical knowledge while the medical data specific to the patient would be delivered with RAG during a chat session and would not be a part of the model to protect privacy.???
Conclusion
Out of the box AI models have amazing functionality but may not be ready for use in your specific use case.? By combining existing models with your own data you will be able to address the specific challenges of your industry.??