登录查看更多内容

Understanding LLMs

Neil Rerup

Managing Partner of the First Technology Architecture Firm in North America

发布日期: 2023年8月21日

Okay, we've all heard about ChatGPT, right? Well, it's amazing how marketing allows for the emergence of a 'market leader' when that product may not be the best available. For example, remember the famous fight between the VHS vs Beta video tapes? Beta was the superior technology but VHS had the better marketing position. The same situation may be sitting there for LLMs. ChatGPT was the first out and, as a result, is the defacto market leader. But is it the best LLM or the even the best LLM for you?

BTW, just as an aside, ChatGPT isn't just a LLM. It's a combination of technologies meant to bring a capability to market, not just the GPT LLM.

I'm not going to go into how LLMs work in this article - I'll deal with that later. What I'm going to do is try to explain the different LLMs and what the 'names' mean. First thing I want you to do is go to the HuggingBear website. HuggingBear has a number of capabilities but the primary thing I'm going to point out to you is the Open LLM Leaderboard, which is what the attached URL will take you to. The purpose of the list is that anyone that is creating LLMs submits them to HuggingBear for an independent measuring against benchmarks and they will get scores against some different tests. The aggragate score results on where that LLM is on the list. That list changes daily as a new LLM is uploaded but if you were to do a search on the page for ChatGPT (as of August 21, 2023), you wouldn't find it because it's been awhile since an update has occurred.

Leading LLMs

Now, there are 3 more well known LLMs that I'll start to talk about and they are developed by the Big Boys. GPT was primarily developed by OpenAI (providers of ChatGPT), Orca is developed by Microsoft, and LlaMa is developed by Meta (formerly FaceBook). If I was to write a Gartner Magic Quadrant paper, I'd probably have 2 other primary LLMs; PaLM which is developed by Google, and Falcon which was developed by the Technology Innovation Institute (TII) out of Abu Dhabi / UAE. But if you were to look at the core LLMs, you are looking at products put out by the big cloud providers with a couple of new players.

Model Sizes

With the understanding of what the different core LLMs are, it's important to understand what the full names of different LLMs are. Let's give an example using LlaMa 2 (created by Meta). Currently there are 3 model sizes available; 7B, 13B, and 70B. There's a reason behind these model names - they are the number of parameters in the neural net associated with that model with the B standing for Billion. LlaMa 2 7B has 7 Billion parameters, whereas LlaMa 2 70B has 70 Billion parameters. When you see a LLM with a Number followed by a B, it indicates the size of the neural network.

Keep in mind that the bigger the neural network, the longer it takes to search it for the appropriate word combinations. Remember, a LLM is just about combining words in the appropriate manner. It's NOT about the data, but the language used. Bigger neural networks allow for better, more accurate answers to queries but it also means it may be a bit slower. If you are going to use a bigger neural network, try to have a more powerful computers (with GPUs) so that you can deal with the longer time to search in the neural network.

Pre-Training Data

Okay, we understand the size of the neural network. The next aspect that we want to know is the size of data that the neural network was Pre-Trained on. The larger the size of training information, the more word combinations that the neural network will have. Again, looking at the LlaMa 2 models, they currently indicate that the models are trained with 2 Trillion Tokens (NOTE: 1 Token is roughly 1 syllable). So a large neural network that has a smaller pre-training token size probably means a less versatile response can be expected. You also want to understand how recent the chunk information is from. ChatGPT is notorious for saying that it's only trained on data for up to Sept 2021.

Don't get blown away by seeing the term "2 Trillion Tokens". GPT-3 basically was trained on ~1500 words, GP-3.5 on ~3000 words, GPT4 on ~6000 words, and GP 4 (32K) on ~24000 words. I just asked Google the number of words in the english language and I got 171476 (not including slang). LLMs have a long way to go.

领英推荐

Brands are growing more concerned about how they are…

Fast Company 11 个月前

ChatGPT Gets A Memory – Here’s All You Need To Know…

Bernard Marr 1 年前

How ChatGPT Will Massively Disrupt Many Industries And…

Colin Shaw 2 年前

Context Size

Lastly, you want to be looking at the Context size. Again, remember that LLMs are about providing the correct word combinations to a query. LLMs will NOT have the underlying data (other than how they were trained for language). So for a LLM to provide information based on data, they need to be fed information from a data source. Currently, LLMs can't handle large streams of data all at once, so it has to be broken down into usable 'chunks'. The larger chunks that can be handled, the more data it can take on it's inputs. You also want to understand how recent the chunk information is from. ChatGPT is notorious for saying that it's only trained on data for up to Sept 2021.

Looking at GPT-3 vs GPT-3.5 vs GPT-4 gives you an understanding of improvements in GPT models. GPT-3 could only make use of 2049 tokens per chunk, GPT-3.5 could only take in 4096 tokens per chunk. GPT-4, on the other hand, can handle 8192 tokens per chunk and has a 32K version that can take in 32768 tokens per chunk.

Summary

When you look at solutions making use of LLMs, understand what you are getting because the LLM information will dictate how good the solution that you will be using is. If one person offers you a ChatBot using GPT 3 and another offerings you a ChatBot using LlaMa 2, you will be getting very different quality solutions. Ask the right questions and also understand that the market is changing SO fast that what you get today may not be the best in 1 month (yes, it's changing that fast). So also ask how your provider will be updating there solution that you are purchasing.

Hope this helps ...

Neil

要查看或添加评论，请登录

Neil Rerup的更多文章

Onboarding New Architects: Things to make ready

2024年4月8日

Onboarding New Architects: Things to make ready

This last week, myself and one of my Architects started supporting a client and, as a result, we were onboarded in…
True Architecture is Timeless. It's NOT a fad!

2024年4月2日

True Architecture is Timeless. It's NOT a fad!

Over the long weekend, I had a number of conversations where people were referring to certain IT Architecture aspects…
ECSA's Reference Architecture Practice

2024年3月26日

ECSA's Reference Architecture Practice

I now have yet another client that would like ECSA to build out their Architecture Practice because they understand…
Recruiters and Architects: Hesitant Partners?

2024年3月21日

Recruiters and Architects: Hesitant Partners?

Over the years, I've worked with a lot of Recruiters. Some good and some bad.
Individual Contractor vs IT Architecture Firm: Which is better?

2024年3月19日

Individual Contractor vs IT Architecture Firm: Which is better?

A couple of weeks ago, I put up a survey asking people whether they would prefer to use an IT Architecture firm or use…
Securing ChatBots and Data Architecture

2024年3月14日

Securing ChatBots and Data Architecture

Wow! I look at that title and see 3 different Architecture Towers; Security Architecture, Application Architecture, and…

10 条评论
How to hire a GOOD Contract Security Architect

2024年3月12日

How to hire a GOOD Contract Security Architect

I've been a Security Architect for pushing 25 years (24 years and counting) and I've seen a LOT of people that claim to…

4 条评论
A Series of Articles on Security Architecture Practice in the Enterprise

2024年3月10日

A Series of Articles on Security Architecture Practice in the Enterprise

I've been a Security Architect since 2000 and in IT since 1988, back before Ethernet and TCP/IP were the standards for…

4 条评论
Rephrasing the Term "Outsourcing IT Architecture" to "Outsourcing CONTRACT IT Architecture"

2024年3月5日

Rephrasing the Term "Outsourcing IT Architecture" to "Outsourcing CONTRACT IT Architecture"

I realized in my last post that I may have been using the wrong verbiage when describing what my company does (thanks…
How in the World can you justify Cyber Security Expenditures?

2021年9月10日

How in the World can you justify Cyber Security Expenditures?

Okay, keep in mind that I'm a Security Architect. I've been one since 2000 and I've been in IT since 1988.

2 条评论

See all articles

Understanding LLMs

Neil Rerup

Managing Partner of the First Technology Architecture Firm in North America

Leading LLMs

Model Sizes

Pre-Training Data

领英推荐

Context Size

Summary

Neil Rerup的更多文章

社区洞察

其他会员也浏览了

What is the Difference Between Gemini and ChatGPT?

ChatGPT 4.5: Current Online Sentiments

ChatGPT Plus Subscription - Worth it or not?

What Is The Competitive Advantage Of LLMs Like ChatGPT For Your Business? Three Takeaways

Should We Choose ChatGPT-3 Turbo or ChatGPT-4 Turbo???

Could DeepSeek or Qwen Outperform ChatGPT? The Answer Might Surprise You !

Why Comparing ChatGPT to DeepSeek Is a Waste of Time (And What You Should Really Be Asking)

Why ChatGPT-4o Might Just Make Your Custom AI Obsolete

Advanced Prompt Engineering with ChatGPT Frameworks

Discover the Free and Effective Way to Master ChatGPT From Basics to Advanced Techniques

Leading LLMs

Model Sizes

Pre-Training Data

领英推荐

Context Size

Summary

Neil Rerup的更多文章

Onboarding New Architects: Things to make ready

True Architecture is Timeless. It's NOT a fad!

ECSA's Reference Architecture Practice

Recruiters and Architects: Hesitant Partners?

Individual Contractor vs IT Architecture Firm: Which is better?

Securing ChatBots and Data Architecture

How to hire a GOOD Contract Security Architect

A Series of Articles on Security Architecture Practice in the Enterprise

Rephrasing the Term "Outsourcing IT Architecture" to "Outsourcing CONTRACT IT Architecture"

How in the World can you justify Cyber Security Expenditures?

社区洞察

其他会员也浏览了

What is the Difference Between Gemini and ChatGPT?

ChatGPT 4.5: Current Online Sentiments

ChatGPT Plus Subscription - Worth it or not?

What Is The Competitive Advantage Of LLMs Like ChatGPT For Your Business? Three Takeaways

Should We Choose ChatGPT-3 Turbo or ChatGPT-4 Turbo???

Could DeepSeek or Qwen Outperform ChatGPT? The Answer Might Surprise You !

Why Comparing ChatGPT to DeepSeek Is a Waste of Time (And What You Should Really Be Asking)

Why ChatGPT-4o Might Just Make Your Custom AI Obsolete

Advanced Prompt Engineering with ChatGPT Frameworks

Discover the Free and Effective Way to Master ChatGPT From Basics to Advanced Techniques