How to Think About LLMs
Photo Credit: Rob Grzywinski

How to Think About LLMs

Here’s how I like to think about Large Language Models (LLMs) such as ChatGPT: A smart colleague that works in a different industry that has dementia.

Let’s unpack this.

LLMs have been trained on any text that they could get their hands on from websites to books. Basically if you can find it on the internet (and it’s not behind a login) then it’s safe to say that your friendly neighborhood LLM has already gobbled it up. Because it takes time and money to train an LLMs, there’s typically a cutoff date after which the LLM has no knowledge of content. (In the case of OpenAI’s ChatGPT, it’s anything before mid-2021.) All of this training means that the LLM has knowledge of basically anything and everything that’s ever been written down.

(Quick aside: LLMs are hungry beasts. There is math that says “for a model of size X you must provide Y inputs to optimally train it”. As the models have gotten bigger, it’s become harder and harder to find more useful data to feed it. They're effectively exhausted the available data already! So where to get more data? If were to make a guess, there's a good reason that OpenAI just-so-happened to come out with "a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition" in late 2022. ??)

(Quicker aside on the Quick Aside: Early in 2022 it was found that the original math for computing the optimal training size was wrong! Most of the current models such as ChatGPT are undertrained. Simply training the current models with more data would provide better results. Look up "Chinchilla AI" to learn more.)

The fact that LLMs have been exposed to everything is both a blessing and a curse. This is where the "smart colleague that works in a different industry" comes into play. Think about your last social event where you were introduced to someone new and you were trying to explain what you do. You have to start broadly and help the other person find analogies to things that they understand. ("Nice to meet you too. I'm a CTO at a small software startup." "Oh! So you do IT?" "No. I manage the software developers that write the product." "So you write software?" "Not too much any more. Think of me more as an over-paid baby sitter." And so on passing through the dreaded "I have this problem with my computer..." <shudder>.) This is true with LLMs as well. It can be challenging to provide enough context to the LLM so that it can understand what you're talking about and what you're trying to do.

Speaking of context, you are only allowed a certain amount of text (which are counted as "tokens" -- a unit that is larger than individual letters but may be less than whole words) to be used as input to the LLM. For the GPT's this was initially 2048 tokens (~1,500 words) and was increased mid-2022 to around four thousand tokens! Everything that the LLM needs to know about the current task must fit within this window. Outside of this, the LLM has no memory of what occurred.

My father was recently diagnosed with Parkinson's disease and is experiencing dementia. If you ask him questions about things that happened some time ago, he's as sharp as a tack! If you ask him what he had for lunch or what we talked about five minutes ago then he struggles. This is exactly how it feels like when working with LLMs. Many of the same techniques I use when trying to keep a conversation with my father on track work well with LLMs: keep the context simple or keep repeating it, expect that the conversation will go off into the weeds, highlight what's working and what's not, when you find a topic that's working well, spend some time on it and finally expect that sometimes he's going to make crap up to account for the fact that he simply doesn't know what's going on but is trying to be helpful.

(Another aside: There's a whole other topic here to unpack but I've been fascinated with how much more I've learned about people through the use of LLMs. When you're forced to think about how we communicate and how to make that communication more effective rather than simply taking communication for granted, it provides a completely different perspective. There's always articles about how LLMs are going to take away jobs but I can imagine how entire new industries are going to be birthed out of the fact that these new "alien" forms exist and we need to learn the care and feeding of them and learn how to co-exist. There's a future social event where someone is trying to explain that they spend their day caring for depressed LLMs.)

When it comes to LLMs, think of them as a smart colleague that works in a different industry that has dementia; they have a wealth of knowledge but they struggle with the context. Provide enough context to keep the conversation on track and be patient. Many of the same techniques you use with people can work well with LLMs. With enough patience and understanding, you can unlock the potential of these remarkable tools.

(1,119 tokens)

Ray Rahman

CEO and Founder at Kaliber.AI - Hiring Full Stack Engineers

2 年

Dave J pointed me to your article. Elegant simplicity that lingers and instigates. Looking forward to exploring it with you. We are in the AI for surgery space. We are creating intelligent solutions that guide surgeons in real-time contextually and a whole bunch of post-operative solutions

Glen Hastings

Data Science and Analytics Executive | Ex - Meta / Instagram / Facebook / Yahoo! / Accenture

2 年

Great perspective and deeply appreciate the use of the analogy.

David Jakubowski

Making Production AI Achievable & Scalable | President @ Union AI | Ex-FB, Microsoft Leader | 3x Successful Exits

2 年

Keep 'em coming grz! These are awesome. Have to admit - you got me with the opening, was worried where this was headed at the start. Well played

要查看或添加评论,请登录

Rob Grzywinski的更多文章

社区洞察

其他会员也浏览了