Should companies train their own LLM?
Image courtsey: Rajashree Rajadhyax

Should companies train their own LLM?

Enterprises all over the globe have started using Generative AI. They are using it for improving their communication, enhancing efficiency of their people and serving their customers better, to mention a few of the many areas where GenAI is being used.?

The adoption of GenAI by industry is much faster than earlier AI techniques such as machine learning and computer vision. The credit must go to the capability and versatility of foundation models, especially the large language models. The chief attraction of these models lies in the fact that they can be used directly, without training or modification. This allows users to try them for various applications and settle on the use cases most suitable for them.

In this article, we will discuss an important question about the LLMs: should companies train LLMs? It is true that many LLMs can be used without training. But training models can bring some special advantages to their users. We will see both sides of this debate. This discussion should help the data science or AI teams to decide what is right for their companies.

What are the different LLMs that companies use??

When it comes to LLMs, enterprises now have a wide variety of options to choose from. A vast array of LLMs is available, and new models are being introduced at a frantic pace. The models fall in two main categories:

  1. Proprietary models: These are also called ‘closed models’. They are created by large AI companies and made available through an API. Well known examples are OpenAI’s GPT-4 series, Google’s Gemini series and Anthropic’s Claude series.
  2. Open source models: These models are available from repositories such as HuggingFace. Anyone can run these models (subject to their license terms) on their machines. Popular examples are the Llama series and the Mistral series.

While the proprietary models are available only as API’s, open source models can be used in two different ways:

  1. Cloud providers such as Microsoft Azure and Amazon Aws provide what they call as Model-as-a-Service. They host the open source models on their servers and make them available to users as API. This is very similar to the proprietary case, except that you choose the model.
  2. Teams can host open source models on their own servers (cloud or on-premises). In this case, they have to take care of everything, including management of the model and the server. Frameworks such as Ollama make this deployment easier.

So far, we spoke about using models as they come, without any training. But large language models are after all neural networks, and we should be able to train them for our purpose. Before we see where training fits in the above scenarios, we will discuss what training LLMs entails.

How are LLMs trained?

LLMs are not trained in one big session. There are multiple layers of training. However, there are two main types of training:

  1. Pre-training
  2. Fine-tuning

Let’s see the difference between these two.

Pre-training takes an empty model structure (called a Transformer) and trains it with a huge amount of text. This training imparts the model with its language capabilities as well as knowledge of the world. After this basic training, models are also given training for instruction following, which makes them capable of answering questions. Models also receive special training to respond in a manner that a human finds pleasing and to avoid objectionable responses. The popular models that we all know and use have gone through all these layers of pre-training.?

Fine-tuning is performed on models that have been pre-trained, not on empty shells. It uses a small amount of training data, but this data is carefully chosen for a particular purpose. For example, a pre-trained model may be fine-tuned using financial statements so that it understands financial data well.?

Pre-training is an expensive and time consuming proposition. Most LLMs are very large networks, containing billions of parameters. To learn all these parameters from an equally huge amount of data requires a really large amount of computing power. To give a recent example, Meta’s Llama 3.1 405B model was trained using around 16,000 NVidia H100 GPUs.

Fine-tuning, however, is a much more manageable affair. To be sure, it also requires GPUs, but cheaper, consumer grade GPUs can do the job. Fine-tuning does not change all the parameters of the model. It either adds a few new parameters or changes some selected parameters of the model. Thus the computing power and time required is much less.?

In short, the expensive pre-training adds fundamental and general capabilities to the LLMs. Fine-tuning is more affordable and makes the models more suitable for a particular purpose.?

With this background, we will now see the arguments for and against training your own model. We will begin with why companies should NOT train models.

Why companies should not train LLMs

The reason why companies should not do pre-training of LLMs is obvious: it is far too costly. The organization that plans to train an LLM will have to not only buy or lease expensive infrastructure, but has to also hire very specialized resources. The return on such investment and efforts may not be justified for most companies, especially when so many options of pre-trained models are available.

Most companies may not even need fine-tuning. A lot of use cases work properly with the ready, pre-trained models. Though much smaller as compared to pre-training, fine-tuning also involves considerable effort. In this case, the major part of the effort goes in creating the training data. The training data has to be carefully selected to suit the intended purpose. The company might have to allocate senior and knowledgeable resources to create such data. This is some serious commitment for any organization.

Unlike pre-training, fine-tuning is not one time. The LLM technology is continuously evolving. New and more advanced models keep on getting available all the time. The users of a fine-tuned model will naturally want to upgrade to the newer version. But the new model will have to be fine-tuned first, before it can serve the use case. This means that the organization has to spend time and money on fine-tuning models frequently.

These are some arguments against training your own model. Now we will see what are the motivations for training your model.

Why companies should train their own LLM

There are some special cases where training an LLM can solve problems that cannot be solved otherwise. Let’s see some of these cases. I will be only discussing fine-tuning in this section. Pre-training a model is a rare requirement for a company that is not into the business of making models, and thus not discussed here.?

You know that to get the right response from the LLM, we have to prompt it properly. This activity is called prompt engineering. However, most of the users in an organization will not be skilled at using proper prompts. Fine-tuning can help to overcome this challenge. When fine-tuned with queries expected from our users, the LLM can respond properly to prompts that will be inadequate otherwise.

The most common method of using an LLM in an enterprise is called Retrieval Augmented Generation (RAG). In this method, the LLM is given material selected from the data inside the organization. The model is told to use the material and form its answers. RAG makes it possible for the model to make organization data more accessible to users. Because of this, RAG has become very popular with corporate LLM users.

While using RAG, the knowledge of the model is not used. Only its language and common sense abilities are applied. This means that we can use much smaller models in RAG configurations.?

What is the advantage of using smaller models? Larger models such as Llama 70B require huge computational resources to run. This can lead to significant expenses, especially if the models are used at large scale. With smaller models, the costs can be reduced.?

But the smaller models might not be good with some of the tasks. To tackle this, we can fine-tune them to be good at the right things. Thus a good formula for a company is to select a smaller model, fine-tune it for the task it is supposed to perform (like summary or extraction) and use it in a RAG pipeline. This way the company can achieve good performance at a lower cost.

There are certain tasks which LLMs find it hard to perform without fine-tuning. One such example is natural language query on databases. In this task, the model is expected to take a natural language sentence and convert it into an SQL query that can run on a particular database. The common method for this task is to include the schema of the database in the prompt. However, it has been observed that fine-tuning the model with expected queries enhances the accuracy of the output SQL by a large margin. A similar application is to convert language query into an API call.?

Summary

We have seen arguments both for and against companies training their own model. The general conclusion seems to be that you should not train a model if your use cases are well served without training. However, there are use cases where fine-tuning becomes advantageous and sometimes even necessary. There are advanced applications such as agentic workflows in which the fine-tuning plays a big role. We will study these applications in a future article.

Yuliia Butovchenko

2x Founder | GTM 0-1 | Document automation

7 个月

Pre-training LLMs from scratch for specific use cases is the only way to achieve the best results. We’re about to launch our ModelEngine platform, which allows NLP data scientists to train LLMs from scratch. And it’s not as expensive as many think! In fact, companies might spend more annually on GPT-4 API calls. We have ongoing projects in stealth mode that prove the savings!

回复
Anand Khandekar

ArcticTurnFoundation | Validus Analytics | ValidusEduTech | IoT | EDGE AI | Sustainability | Research | SDG

7 个月

Devesh Rajadhyax thank you for elaborating both sides of the coin. Please continue to throw light on planning a strategy for the efficient use of LLMs by early stage start-ups and Small Scale companies (manufacturing sector). Fine tuning is of course the way out for them, considering costs and human resources. Would love to read more into your thoughts on this.

Ajit Joshi

LinkedIn Top Voice 2024 | Red Hat Partner Ecosystem | FSI Alliances | Fintechs | ISV Expansion | AI, Open Source, Cyber Security, Automation.

7 个月

Very informative. Thanks for sharing this! Reposting.

Devesh Rajadhyax

AI startup founder, Author of ?? 'Decoding GPT'

7 个月

For in depth understanding of how LLMs work and trained, refer to my book ‘Decoding GPT, an Intuitive Understanding of Large Language Models’ https://www.amazon.in/Decoding-GPT-Intuitive-Understanding-Generative/dp/8119445791

Yohan Bensoussan

Business Tech Leader | Gen AI Architect | IBM Ecosystem and Build

7 个月

fine tuning will become accessible to everyone. but companies have no chance to train llm better than providers.

要查看或添加评论,请登录

Devesh Rajadhyax的更多文章

  • Students doing homework with ChatGPT is a non-issue

    Students doing homework with ChatGPT is a non-issue

    A few weeks ago, I wrote an article claiming that the impact of automatic code generation by ChatGPT is much less than…

    4 条评论
  • ChatGPT's code generation will not impact IT industry

    ChatGPT's code generation will not impact IT industry

    Many of my conversations in the last few days revolved around ChatGPT. This is hardly surprising, given the impact the…

    3 条评论
  • AI - A story of four games

    AI - A story of four games

    AI is not new; it has a 60-year history. It has seen many ups and downs.

    8 条评论
  • Is your company ready for Predictive Analytics?

    Is your company ready for Predictive Analytics?

    Every business leader has now become aware of the application of AI/ML to Predictive Analytics. They would like to…

    4 条评论
  • AI in Computational Biology (Part 1)

    AI in Computational Biology (Part 1)

    (This article is a reproduction of the lectures I have given in Engineering Colleges in Mumbai, for students and…

    4 条评论
  • 4 hints to get started with AI in your company

    4 hints to get started with AI in your company

    Most companies are working on Digital Transformation today, and Artificial Intelligence is a critical part of that…

    11 条评论
  • Four ways in which AI can help humankind

    Four ways in which AI can help humankind

    Artificial Intelligence is receiving more than its fair share of public attention. On one side there are promises of…

    8 条评论
  • Desperately needed: An Indian AI giant

    Desperately needed: An Indian AI giant

    We urgently need an Indian company of large size, like TCS or Infosys, or even Flipkart, focusing on Indian AI. What is…

    13 条评论
  • The world of chatbots

    The world of chatbots

    Chatbots are becoming quiet a phenomenon. They are AI's flagship demonstrations.

  • Elon Musk's unlikely competitor - ISRO!

    Elon Musk's unlikely competitor - ISRO!

    It was only six months ago that I wrote about a rivalry in making, Elon Musk v/s Jeff Bezos. https://www.

    1 条评论

社区洞察

其他会员也浏览了