Burning To Train Your Own Large Language Model? Here Are Some Important Considerations!

Burning To Train Your Own Large Language Model? Here Are Some Important Considerations!

Large Language Models May Not Be All You Want...

You have probably played around with ChatGPT and are wondering, how could I apply something similar to my own business data.

Even though everybody is talking about large language models or foundational models, by themselves they are not going to give you ChatGPT-like capabilities. A large language model takes in a sequence of words, and attempts to predict the next word. That's how they are trained on vast amounts of data.

If you started a conversation with a model like that, you would be rather disappointed. The answers would likely not going to be very useful at all. So how do you go from that, to something like ChatGPT?

It turns out if you nudge a foundational model in a useful direction, it starts producing relevant answers. The secret sauce here comes by providing human feedback. Basically OpenAI hired an army of contractors to manually label questions and answers, as in what indicates a useful response. These responses can be used in an initial step to fine-tune a large language model.

Even for a Microsoft-funded organization it is difficult to scale humans. So they took human feedback and built a model out of it. The idea being that you could provide a prompt and response to this model and it could rate the level of usefulness.

This "usefulness" model is called a "reward model". The nice thing about it is that you can utilize it to keep nudging the original foundational model towards more useful responses. This process is called reinforcement learning.

The more responses are labeled by humans, the better the reward model, and the more potential there is for the resulting tuned foundational model to be more capable.

If you are considering to adapt a ChatGPT-like model to your business it is important to understand that there are two relevant components. One is the foundational model, and the other one is that model that tells you what constitutes a useful response (the reward model).

There is another aspect to ChatGPT-like models that matters. Typically when the input is provided you also provide it some kind of a prompt. It turns out that the choice of words in your prompt affects the output of the model. By choosing prompt words in a certain way you might end up with a more relevant response. This has given rise to what they call prompt engineering, people trying to figure out how to talk to the model, to get the most relevant responses.

Options for Applying a Tuned Large Language Model To Domain-Specific Business Data

Now that we have covered some of the basics, let's consider what kinds of options do you have to tune a large language model so it works with data specific to your business domain.

Prompt Design

This basically means take your data and split into chunks of text which you feed to a model like ChatGPT. For this you can utilize the OpenAI API which may be the cheapest and easiest way to get started. If you are more after generic types of capabilities such as summarizing text or assessing sentiment this might work out well.

However, if your business domain has unique terminology, or your business data is large, you might not be able to get a desired level of performance.

Tuning of a Foundational Model Using Your Own Business Data

For business domains where prompting alone is not adequate, the best option is to tune the large language model on domain-specific data. The advantage here is that not everything needs to be communicated in terms of prompts, and also the underlying language model becomes familiar with any business-specific terminology.

Tuning a model entails taking the original model and continuing to train it on internal business data. Since internal data tends to be much smaller in size compared to the typical training data for a foundational model, the tuning effort is far less expensive than training from scratch.

Note, there are some approaches where in the tuning process only a portion of the weights are trained. This is done to make the process more efficient.


Training a Reward Model

Letting the training of a large language model loose on your own business data will likely be beneficial to capture domain-specific language. On a small scale you can tune the model using manually curated prompts. However, it is not a guarantee that the resulting model will be useful.

If you recall our discussion from the previous section, the model that indicates usefulness (the reward model) is a separate model. In order to tune a large language model, to reflect what is useful to your business, you will need to curate a training data set for the reward model.

The training data for the reward model would be a list of (prompt, response, reward) triplets that indicate the level of response usefulness. Only embark on training the reward model if you find that the tuning the foundational model alone is not sufficient.

Prompt Tuning

We have already established that modifying prompts can affect how useful the responses are. With prompt tuning the original foundational model is fixed. Rather than guessing the prompt manually, a model is trained to translate the user prompt to a model-prompt in a way that maximizes the relevance of the large language model output. In other words prompt engineering is done by a machine learning model.

No alt text provided for this image
Image by Google



Training the prompt model entails much fewer weights. This type of tuning can therefore be done very efficiently.

What is interesting with this approach is that you can take a smaller model and tune it to be comparable in performance to a much larger model.

Frameworks and Offerings For Tuning Large Language Models

At this moment there are several managed and open-source options available to businesses interested in deploying custom models.

Azure OpenAI Service

As OpenAI's investor Microsoft provides the capability on it's Azure platform to customize and deploy GPT models. The customization is done by providing prompt/response pairs and fine-tuning the weights of the base model


No alt text provided for this image
Image by Microsoft


NVDIA Nemo Service


Nemo provides large-language models of various size depending on need. It also provides

  • Ability to define guard rails
  • Ability to infuse business data
  • Prompt Tuning
  • Training of Reinforcement Model


No alt text provided for this image
Image by NVDIA


Lamini

Lamini is a startup whose services are currently available through a wait list. Their intent is to streamline custom model tuning and deployment. While their capabilities are still in development they are worth following.


No alt text provided for this image
Image by LAMINI

Open Source Tools

For companies that are looking to do everything in house, there are various opens source tools available. The following stand out:

  • DeepSpeed - A framework by Microsoft that can be used to tune large language models
  • TRL - Transformer Reinforcement Learning framework. A framework that can be used to apply reinforcement learning to GPT-2 and BLOOM models.
  • PEFT - Parameter Efficient Fine-Tuning by HuggingFace. A framework that provides optimized tuning capabilities or various open-source models. It also implements prompt-tuning.


Final Considerations

There various ways in which large language models can be tuned. A lot of work has gone into making the tuning effort more efficient. We have seen several hosted offerings as well as open source tools.

With all of these optimizations the cost of fine-tuning a model for a domain specific dataset is not prohibitive. However, these fine-tuning optimizations don't make the deployment any cheaper. Operating for example a 100B model is quite expensive.

For businesses it is therefore advisable to find the smallest possible custom model that will result in an acceptable level of performance. Utilize prompt-tuning to achieve a performance boost. Proper guardrails should also be emphasized along the way to avoid hallucinations, bias or any offensive and irrelevant responses.

Oleg Lavrentyev

Founder at Olearis | Expert in Startup Tech & Software Planning | 15 Years of Industry Leadership

1 年

Thank you for sharing the article!

回复
Robert DuWors

Digital Substrate Architect with insight from being there (Retired)

1 年

So dislike “predicting” (it isn’t predicting rather associating during the inference stage) one-word-at-a-time (obviously higher and boarder level associations are also in play). In refinement something must be unfrozen. The output layer? Are you now effectively mapping between the entire prompt and the response? If so are you back propagating a token or a vocabulary output vector at a time?

Devaan Parbhoo, MBA

CTO | Roastmaster & Founder @Crack Coffee Roastery

1 年

This is a great piece! Insightful and useful because our organization has just moved beyond the hype, and POC to “Let’s get this to work!”

John McGuire

Product Leader | Search and AI

1 年

This is very relevant to my current focus investigating Hybrid search options for Elasticsearch and LLMs

Assaf Kadosh

Your Guide to Explainable Digital Transformation - Translating Tech-Speak Into Transformation Success | Digital Solutions Architect | Digital Creator

1 年

That's quite a share Rudy! Really interesting and useful to know.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了