登录查看更多内容

What I learned from Bloomberg's experience of building their own LLM

Ari Chanen

AI leader, strategist, scientist and coder

发布日期: 2023年8月19日

Hey fellow AI explorers,

Most individuals experimenting with AI large language models (LLMs) utilize the available pre-trained models such as ChatGPT, LLaMa, Bloom, among a constantly growing list of others. It appears that only a handful of companies are constructing their own LLMs from scratch, also known as pretraining. What are the pros and cons of undertaking such an endeavor?

I recently gained valuable insights by listening to the TWIML AI podcast [1] (see the 1st comment below) about how Bloomberg developed BloombeGPT, an LLM for internal company experimentation. The podcast demystified the process of creating a custom LLM. The host, Sam Charrington, talks with David Rosenberg, the leader of Bloomberg's machine learning strategy team. I highly recommend giving it a listen if the subject intrigues you.

One aspect of LLMs that the podcast sheds light on are the nuances between various deployment methods of LLM technology, namely:

Pretraining: Construct a general LLM from scratch using a diverse array of data. Such LLMs are often referred to as foundation models (FM).
Fine-tuning: Refine an FM using task-specific data to produce an updated LLM that outperforms the FM for that particular task.
In-context learning (ICL): An existing LLM can temporarily learn to produce the desired output from examples provided as a pattern within prompts.

The term "foundation model" when used to describe an LLM implies the expectation that this LLM is to serve as the basis of future, fine-tuned LLMs.

The podcast also covers other interesting aspects of LLM pretraining that I will discuss in a follow-up article.

Different Approaches to using/building LLMs

The three approaches covered below are listed in order of difficultly in terms of human effort, time and expense from most to least difficult.

Constructing your own LLM from scratch i.e. pretraining

This approach requires a vast collection of text training data (while some LLMs utilize image data, this article focuses solely on text). The process might span several months and could cost anywhere from $1 million to $100 million.

Ultimately, the model will comprise tens of millions to a trillion parameters, representing weights on individual nodes within the intricate, multi-layered architecture of the LLM's deep neural network (DNN).

Building your own LLM, while expensive and time-consuming, will give you the most control over the way the raw input data is processed and offers the most protection of your proprietary data.

Fine-tuning an existing foundation model LLM

With an FM and a unique dataset, like Bloomberg's extensive financial text data, you can refine the FM to create an updated LLM. This is particularly effective if the fine-tuning data differs significantly from the FM's training data.

领英推荐

?? Is AI Capable of Reflection?

Pascal Biese 1 个月前

Is AI Really as Smart as We Think? Breaking Down AI's…

ChandraKumar R Pillai 1 个月前

Daniela and Dario Amodei: royal family of AI

Mikael Alemu Gorsky 1 个月前

Fine-tuning can be more cost-effective (ranging from $1,000s to $10,000s) and quicker (from around a few hours to several days) compared to building from scratch.

To fine-tune an FM, access to the FM's DNN parameters is essential. It's worth noting that, as of this writing, OpenAI hasn't provided access to GPT-4 or GPT-3 parameters, making them unsuitable for fine-tuning. However, other models, like Llama 2, do offer their parameters for fine-tuning.

Employing in-context learning (ICL)

ICL often involves crafting effective prompts to elicit desired outputs from an LLM chat interface, such as ChatGPT. Using ChatGPT with sample inputs and their corresponding outputs as part of the prompt is a form of ICL. For example, in the prompt, a few pairs of a word and its dictionary definition could be provided in a particular format; Finally in the last part of the prompt just a word would be provided and the LLM would be expected to come up with the definition for that word with the output in the format that was specified earlier in the prompt content. See [2]. Variants of ICL include few-shot, one-shot, and zero-shot learning [3]. According to [2], ICL is a form of temporary fine-tuning although it's a continuing research question as to exactly how it works.

ICL is the most straightforward way to leverage someone else's LLM, like using OpenAI's ChatGPT or GPT-4 APIs without the need for fine-tuning.

Using a LLM through APIs that are hosted over the Internet, might risk exposing proprietary data to the LLM's host. This is one of the reasons that Bloomberg decided to explore pretraining their own LLM because it had the least risk of exposing their valuable data to a company like OpenAI.

ICL is the most affordable option compared to pre-training and fine-tuning, costing only the token processing and computation fees associated with each prompt submitted.

How much effort and money did Bloomberg put into their LLM?

In the podcast, David Rosenberg of Bloomberg, reported that the total cost was just over $1 million. This makes their model comparable to OpenAI's GPT-3 which reportedly cost about $4.6 million. Compare that to OpenAI's GPT-4 which cost over $100 million.

Rosenberg reported that once they had settled on their processing methodology, the final compute run took 53 days. See the podcast [1] for information on the exact training architecture. They had other processing methodologies that they tried out first so they spent more than just these 53 days in compute time. The whole project lasted for about one year.

The team that worked on this project consisted of nine, full-time employees. I believe that four them did coding, building the machine learning system and running experiments and training. The other five reviewed the literature to find the latest methods for pretraining LLMs and drove the process of trying to optimize the final LLM.

Conclusion

This article mainly describe the differences between LLM pretraining, fine-tuning and ICL as background and in the context of Bloomberg's decision to build their own pretraining LLM.

In my next article, I'll share my perspective on why Bloomberg opted to go through the whole process of LLM pretraining. I'll also discuss the customization benefits they gained, such as modifying the tokenizer, which wouldn't have been possible otherwise.

I welcome your comments and will try to answer them.

Kevin J Coogan

founder at zettacap

1 年

Ari, thanks for the insights. My feeling is that the true cost of a project like this would be considerably higher than what Rosenberg states ~ $1million. Taking their own data (9 full-time employees working for about a year) would already push the limits of / exceed their estimate. Then, you would need to add additional expenses like compute time (53 days on the final run, but how many dead-ends did they hit?). Also, you have the issue of this project having direct access to Bloomberg articles -- likely some of the most curated / clean finance-related articles. If this were a standalone project, much of the project's time would be spent on data prep, or they would have spent a lot of money accessing finance specific articles. My guess is that the true cost of a project like this would exceed the $1 million estimate if looked at as an internal project and would be closer to the GPT-3 estimate if you looked at it like a standalone project.

1 次回应

Paul Steven Conyngham

1 年

This is the best article Ive seen that summarises the 3 different approaches and pros & cons attached with each - great read Ari

2 次回应

John Angley

Public policy analyst/strategist.

1 年

Ari thanks for your insights and comments. The more discussion we have, the better we’ll all understand AI. I’m sure the advantages outweigh the risks. Thanks. Ja

1 次回应

David McKeague

1 年

Fantastic insights

1 次回应

Damon Feldman

Data | AI | Data | Healthcare | NoSQL

1 年

Very interesting. I wonder how one determines the limits of ICL, to help decide when to try fine tuning a model?

查看更多评论

要查看或添加评论，请登录

查看全部

What I learned from Bloomberg's experience of building their own LLM

Ari Chanen

AI leader, strategist, scientist and coder

Different Approaches to using/building LLMs

Constructing your own LLM from scratch i.e. pretraining

Fine-tuning an existing foundation model LLM

领英推荐

Employing in-context learning (ICL)

How much effort and money did Bloomberg put into their LLM?

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Understanding AI Evolution

The AGI Revolution: How Close Are We to Achieving Human-Level AI?

The Quandary of Model Interpretability: Bridging the Gap Between Accuracy and Explainability

AI Can Decode Your Personality, CFOs Say AI Talent is Critical, Midjourney Prompts + More

Beyond the "Black Box" Analogy: Navigating Towards Responsible and Explainable AI

Reality: brought to you by AI

Amodei, rulers of AI

Hello IP World! Gemini 1.5 is your new innovation Wingman

Exploring the Myths and Realities of Artificial Intelligence

Is Attention All You Need? A Look at Hyena

Different Approaches to using/building LLMs

Constructing your own LLM from scratch i.e. pretraining

Fine-tuning an existing foundation model LLM

领英推荐

Employing in-context learning (ICL)

How much effort and money did Bloomberg put into their LLM?

Conclusion

Why I didn't sign the "Pause Giant AI Experiments" letter

2023年5月4日

Human vs. AI or Human + AI

2021年4月13日

Using Artificial Intelligence and Machine Learning to Improve Student Success

2018年1月2日

社区洞察

其他会员也浏览了

Understanding AI Evolution

The AGI Revolution: How Close Are We to Achieving Human-Level AI?

The Quandary of Model Interpretability: Bridging the Gap Between Accuracy and Explainability

AI Can Decode Your Personality, CFOs Say AI Talent is Critical, Midjourney Prompts + More

Beyond the "Black Box" Analogy: Navigating Towards Responsible and Explainable AI

Reality: brought to you by AI

Amodei, rulers of AI

Hello IP World! Gemini 1.5 is your new innovation Wingman

Exploring the Myths and Realities of Artificial Intelligence

Is Attention All You Need? A Look at Hyena