登录查看更多内容

How to Think About LLMs

Rob Grzywinski

Working on some seriously cool stuff!

发布日期: 2023年2月14日

Here’s how I like to think about Large Language Models (LLMs) such as ChatGPT: A smart colleague that works in a different industry that has dementia.

Let’s unpack this.

LLMs have been trained on any text that they could get their hands on from websites to books. Basically if you can find it on the internet (and it’s not behind a login) then it’s safe to say that your friendly neighborhood LLM has already gobbled it up. Because it takes time and money to train an LLMs, there’s typically a cutoff date after which the LLM has no knowledge of content. (In the case of OpenAI’s ChatGPT, it’s anything before mid-2021.) All of this training means that the LLM has knowledge of basically anything and everything that’s ever been written down.

(Quick aside: LLMs are hungry beasts. There is math that says “for a model of size X you must provide Y inputs to optimally train it”. As the models have gotten bigger, it’s become harder and harder to find more useful data to feed it. They're effectively exhausted the available data already! So where to get more data? If were to make a guess, there's a good reason that OpenAI just-so-happened to come out with "a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition" in late 2022. ??)

(Quicker aside on the Quick Aside: Early in 2022 it was found that the original math for computing the optimal training size was wrong! Most of the current models such as ChatGPT are undertrained. Simply training the current models with more data would provide better results. Look up "Chinchilla AI" to learn more.)

The fact that LLMs have been exposed to everything is both a blessing and a curse. This is where the "smart colleague that works in a different industry" comes into play. Think about your last social event where you were introduced to someone new and you were trying to explain what you do. You have to start broadly and help the other person find analogies to things that they understand. ("Nice to meet you too. I'm a CTO at a small software startup." "Oh! So you do IT?" "No. I manage the software developers that write the product." "So you write software?" "Not too much any more. Think of me more as an over-paid baby sitter." And so on passing through the dreaded "I have this problem with my computer..." <shudder>.) This is true with LLMs as well. It can be challenging to provide enough context to the LLM so that it can understand what you're talking about and what you're trying to do.

领英推荐

ChatGPT Turns 2

AIM 3 个月前

?? Gotta Catch 'Em All!

Pascal Biese 1 年前

?? Today is Your Last Chance to Become Tony Stark!

Ritesh Kanjee 10 个月前

Speaking of context, you are only allowed a certain amount of text (which are counted as "tokens" -- a unit that is larger than individual letters but may be less than whole words) to be used as input to the LLM. For the GPT's this was initially 2048 tokens (~1,500 words) and was increased mid-2022 to around four thousand tokens! Everything that the LLM needs to know about the current task must fit within this window. Outside of this, the LLM has no memory of what occurred.

My father was recently diagnosed with Parkinson's disease and is experiencing dementia. If you ask him questions about things that happened some time ago, he's as sharp as a tack! If you ask him what he had for lunch or what we talked about five minutes ago then he struggles. This is exactly how it feels like when working with LLMs. Many of the same techniques I use when trying to keep a conversation with my father on track work well with LLMs: keep the context simple or keep repeating it, expect that the conversation will go off into the weeds, highlight what's working and what's not, when you find a topic that's working well, spend some time on it and finally expect that sometimes he's going to make crap up to account for the fact that he simply doesn't know what's going on but is trying to be helpful.

(Another aside: There's a whole other topic here to unpack but I've been fascinated with how much more I've learned about people through the use of LLMs. When you're forced to think about how we communicate and how to make that communication more effective rather than simply taking communication for granted, it provides a completely different perspective. There's always articles about how LLMs are going to take away jobs but I can imagine how entire new industries are going to be birthed out of the fact that these new "alien" forms exist and we need to learn the care and feeding of them and learn how to co-exist. There's a future social event where someone is trying to explain that they spend their day caring for depressed LLMs.)

When it comes to LLMs, think of them as a smart colleague that works in a different industry that has dementia; they have a wealth of knowledge but they struggle with the context. Provide enough context to keep the conversation on track and be patient. Many of the same techniques you use with people can work well with LLMs. With enough patience and understanding, you can unlock the potential of these remarkable tools.

(1,119 tokens)

Ray Rahman

CEO and Founder at Kaliber.AI - Hiring Full Stack Engineers

2 年

Dave J pointed me to your article. Elegant simplicity that lingers and instigates. Looking forward to exploring it with you. We are in the AI for surgery space. We are creating intelligent solutions that guide surgeons in real-time contextually and a whole bunch of post-operative solutions

2 次回应

Glen Hastings

Data Science and Analytics Executive | Ex - Meta / Instagram / Facebook / Yahoo! / Accenture

2 年

Great perspective and deeply appreciate the use of the analogy.

1 次回应

David Jakubowski

Making Production AI Achievable & Scalable | President @ Union AI | Ex-FB, Microsoft Leader | 3x Successful Exits

2 年

Keep 'em coming grz! These are awesome. Have to admit - you got me with the opening, was worried where this was headed at the start. Well played

2 次回应

查看更多评论

要查看或添加评论，请登录

Rob Grzywinski的更多文章

Living Through The Modern Manhattan Project

2025年1月29日

Living Through The Modern Manhattan Project

The Public Spectacle We're all watching history unfold in real-time, and it's terrifying. Not because of the technology…

2 条评论
The DeepSeek Déjà Vu: Why We Keep Forgetting What We Already Know

2025年1月28日

The DeepSeek Déjà Vu: Why We Keep Forgetting What We Already Know

We've Seen This Movie Before: The Alpaca Story Remember March 2023? Stanford released Alpaca, and suddenly everyone…

3 条评论
From Broad Strokes to Fine Lines: Crafting LLMs into Collaborative Thought Partners

2024年3月14日

From Broad Strokes to Fine Lines: Crafting LLMs into Collaborative Thought Partners

The Promise and Pitfall of LLMs In the rapidly evolving landscape of technology, Large Language Models (LLMs) are at…

14 条评论
Beyond Generic Responses: Crafting Custom Interactions with LLMs

2024年3月12日

Beyond Generic Responses: Crafting Custom Interactions with LLMs

Einstein on the Phone Imagine for a moment the opportunity to consult with the greatest minds in history for advice —…
Redefining Competitive Advantage: Generative AI and the Erosion of Traditional Business Moats

2023年5月5日

Redefining Competitive Advantage: Generative AI and the Erosion of Traditional Business Moats

In the whirlwind of change that is now our daily existence, large language models (LLMs) and generative AI have…

3 条评论
Striking the Right Balance: Privacy and Utility in Large Language Models

2023年4月6日

Striking the Right Balance: Privacy and Utility in Large Language Models

Large Language Models (LLMs) are rapidly transforming the business landscape, fueling applications from customer…

3 条评论
Uncovering the OpenAI Model Development Conundrum: Loopholes and Legalities

2023年3月30日

Uncovering the OpenAI Model Development Conundrum: Loopholes and Legalities

I mentioned in my "Decoding ChatGPT: Who Owns Your AI Conversations and How They're Used" post that OpenAI's "Terms of…
When AI Meets Psychology: How GPT-3.5 Handles False-Belief Tasks and Human Perspectives

2023年3月29日

When AI Meets Psychology: How GPT-3.5 Handles False-Belief Tasks and Human Perspectives

I'm going to ask you to participate in a quick psychological exercise that will motivate the topics of this post…
Too Fast and Too Furious: The Breakneck Advancements of ChatGPT and Its World-Changing Plugins

2023年3月24日

Too Fast and Too Furious: The Breakneck Advancements of ChatGPT and Its World-Changing Plugins

If you remember waaaay back to what seems like months ago, but was in fact only last week, OpenAI released GPT4 and…

2 条评论
Chatting with Documents: The Future of Search in 2023

2023年3月22日

Chatting with Documents: The Future of Search in 2023

The big thing that you're going to see in 2023 (other than the advancement of Large Language Models (LLMs) in general)…

2 条评论

See all articles

How to Think About LLMs

Rob Grzywinski

Working on some seriously cool stuff!

领英推荐

Rob Grzywinski的更多文章

社区洞察

其他会员也浏览了

Why isn't AI better? My wish list for GPT-5, OpenAI gets closer to AGI, five things NOT to use ChatGPT for, and will agents end service businesses?

ChatGPT cheat sheet: Our complete guide for 2023

AI without filter - From ChatGPT to Prompt Engineering

Testing the limitations of ChatGPT-4

ChatGPT consistently fails (most parts of) the assessment tasks I assign my students. Here’s why.

It happened when ChatGPT met GPT4All in a bar...

How non-programmers can use Chatgpt’s Code Interpreter to kickstart analysis

BARD as a Tool for R&D and Intellectual Property: A Better, Faster Way to Get Things Wrong, or, "Yes, I Lied"

ChatGPT Clears UPSC Prelims

领英推荐

Rob Grzywinski的更多文章

Living Through The Modern Manhattan Project

The DeepSeek Déjà Vu: Why We Keep Forgetting What We Already Know

From Broad Strokes to Fine Lines: Crafting LLMs into Collaborative Thought Partners

Beyond Generic Responses: Crafting Custom Interactions with LLMs

Redefining Competitive Advantage: Generative AI and the Erosion of Traditional Business Moats

Striking the Right Balance: Privacy and Utility in Large Language Models

Uncovering the OpenAI Model Development Conundrum: Loopholes and Legalities

When AI Meets Psychology: How GPT-3.5 Handles False-Belief Tasks and Human Perspectives

Too Fast and Too Furious: The Breakneck Advancements of ChatGPT and Its World-Changing Plugins

Chatting with Documents: The Future of Search in 2023

社区洞察

其他会员也浏览了

Why isn't AI better? My wish list for GPT-5, OpenAI gets closer to AGI, five things NOT to use ChatGPT for, and will agents end service businesses?

ChatGPT cheat sheet: Our complete guide for 2023

AI without filter - From ChatGPT to Prompt Engineering

Testing the limitations of ChatGPT-4

ChatGPT consistently fails (most parts of) the assessment tasks I assign my students. Here’s why.

It happened when ChatGPT met GPT4All in a bar...

How non-programmers can use Chatgpt’s Code Interpreter to kickstart analysis

BARD as a Tool for R&D and Intellectual Property: A Better, Faster Way to Get Things Wrong, or, "Yes, I Lied"

ChatGPT Clears UPSC Prelims