Understanding LLMs
Okay, we've all heard about ChatGPT, right? Well, it's amazing how marketing allows for the emergence of a 'market leader' when that product may not be the best available. For example, remember the famous fight between the VHS vs Beta video tapes? Beta was the superior technology but VHS had the better marketing position. The same situation may be sitting there for LLMs. ChatGPT was the first out and, as a result, is the defacto market leader. But is it the best LLM or the even the best LLM for you?
BTW, just as an aside, ChatGPT isn't just a LLM. It's a combination of technologies meant to bring a capability to market, not just the GPT LLM.
I'm not going to go into how LLMs work in this article - I'll deal with that later. What I'm going to do is try to explain the different LLMs and what the 'names' mean. First thing I want you to do is go to the HuggingBear website. HuggingBear has a number of capabilities but the primary thing I'm going to point out to you is the Open LLM Leaderboard, which is what the attached URL will take you to. The purpose of the list is that anyone that is creating LLMs submits them to HuggingBear for an independent measuring against benchmarks and they will get scores against some different tests. The aggragate score results on where that LLM is on the list. That list changes daily as a new LLM is uploaded but if you were to do a search on the page for ChatGPT (as of August 21, 2023), you wouldn't find it because it's been awhile since an update has occurred.
Leading LLMs
Now, there are 3 more well known LLMs that I'll start to talk about and they are developed by the Big Boys. GPT was primarily developed by OpenAI (providers of ChatGPT), Orca is developed by Microsoft, and LlaMa is developed by Meta (formerly FaceBook). If I was to write a Gartner Magic Quadrant paper, I'd probably have 2 other primary LLMs; PaLM which is developed by Google, and Falcon which was developed by the Technology Innovation Institute (TII) out of Abu Dhabi / UAE. But if you were to look at the core LLMs, you are looking at products put out by the big cloud providers with a couple of new players.
Model Sizes
With the understanding of what the different core LLMs are, it's important to understand what the full names of different LLMs are. Let's give an example using LlaMa 2 (created by Meta). Currently there are 3 model sizes available; 7B, 13B, and 70B. There's a reason behind these model names - they are the number of parameters in the neural net associated with that model with the B standing for Billion. LlaMa 2 7B has 7 Billion parameters, whereas LlaMa 2 70B has 70 Billion parameters. When you see a LLM with a Number followed by a B, it indicates the size of the neural network.
Keep in mind that the bigger the neural network, the longer it takes to search it for the appropriate word combinations. Remember, a LLM is just about combining words in the appropriate manner. It's NOT about the data, but the language used. Bigger neural networks allow for better, more accurate answers to queries but it also means it may be a bit slower. If you are going to use a bigger neural network, try to have a more powerful computers (with GPUs) so that you can deal with the longer time to search in the neural network.
Pre-Training Data
Okay, we understand the size of the neural network. The next aspect that we want to know is the size of data that the neural network was Pre-Trained on. The larger the size of training information, the more word combinations that the neural network will have. Again, looking at the LlaMa 2 models, they currently indicate that the models are trained with 2 Trillion Tokens (NOTE: 1 Token is roughly 1 syllable). So a large neural network that has a smaller pre-training token size probably means a less versatile response can be expected. You also want to understand how recent the chunk information is from. ChatGPT is notorious for saying that it's only trained on data for up to Sept 2021.
Don't get blown away by seeing the term "2 Trillion Tokens". GPT-3 basically was trained on ~1500 words, GP-3.5 on ~3000 words, GPT4 on ~6000 words, and GP 4 (32K) on ~24000 words. I just asked Google the number of words in the english language and I got 171476 (not including slang). LLMs have a long way to go.
领英推荐
Context Size
Lastly, you want to be looking at the Context size. Again, remember that LLMs are about providing the correct word combinations to a query. LLMs will NOT have the underlying data (other than how they were trained for language). So for a LLM to provide information based on data, they need to be fed information from a data source. Currently, LLMs can't handle large streams of data all at once, so it has to be broken down into usable 'chunks'. The larger chunks that can be handled, the more data it can take on it's inputs. You also want to understand how recent the chunk information is from. ChatGPT is notorious for saying that it's only trained on data for up to Sept 2021.
Looking at GPT-3 vs GPT-3.5 vs GPT-4 gives you an understanding of improvements in GPT models. GPT-3 could only make use of 2049 tokens per chunk, GPT-3.5 could only take in 4096 tokens per chunk. GPT-4, on the other hand, can handle 8192 tokens per chunk and has a 32K version that can take in 32768 tokens per chunk.
Summary
When you look at solutions making use of LLMs, understand what you are getting because the LLM information will dictate how good the solution that you will be using is. If one person offers you a ChatBot using GPT 3 and another offerings you a ChatBot using LlaMa 2, you will be getting very different quality solutions. Ask the right questions and also understand that the market is changing SO fast that what you get today may not be the best in 1 month (yes, it's changing that fast). So also ask how your provider will be updating there solution that you are purchasing.
Hope this helps ...
Neil