登录查看更多内容

Insight of the Week: It's GPT4 or the Highway!

Waterfield Tech

发布日期: 2023年11月2日

+ 关注

By Kerry Robinson

So you read The Procrastinator's Guide to AI?and you're ready to get started.

But which AI model should you use? GPT3.5? GPT4? What about Google’s PaLM? Or an open-source model like LLaMA 2?

I think that's easy to answer: "It's GPT-4 or the highway!"

Why? Because GPT4 is soooo much better than all the rest. Just look at this comparison of GPT4 vs 3.5 and the best of the rest at the time of the launch of GPT4 (referred to as LM SOTA, or state of the art, which was Google’s PaLM and Meta’s LLamA). I also included the top-performing open-source LLM, for comparison. All are against standardized benchmarks of AI capabilities, including language understanding (MMLU), common sense (HellaSwag), and reasoning (ARC):

领英推荐

Why Bill Gates believes AI superintelligence will…

Fast Company 8 个月前

Why Bill Gates believes AI superintelligence will…

Fast Company 8 个月前

Seeing Is Believing: The Multimodal AI Evolution

Peterson Technology Partners 1 年前

GPT4 is so far ahead. It's almost a joke. And while it's expensive, and a little slow, that's no reason to bother with another model. The race to leverage AI has already started. Your competitors are working out how to use it to beat you and new entrants are figuring out how to completely disrupt your market, so you can't afford to mess around tweaking and tuning inferior models.

That said, Google recently launched v2 of their PaLM model, and while they report on different benchmarks, so it’s hard to draw a direct comparison to this data, on HellaSwag (Common sense) it’s up with GPT3.5. But that’s still a long way from GPT4. So again, why would you bother using a lesser model, that might constrain your innovation, and slow your speed to market?

Now, there's a possible caveat. Anthropic launched Claude 2 a few months back, and while I can't find any directly comparable benchmarks, there are anecdotal reports of it beating GPT4 in certain logic, coding and text generation tasks. I tend to use Claude for processing text due to its longer 'context window' - you can stuff 100 thousand tokens (around 65k words) into it and it won't choke. GPT4 is limited to about a 3rd of that. The other advantage of Claude is that it's available on Amazon, which is a no-go area for the OpenAI models, presumably due to the tie up with Microsoft.

But I think the principle is clear: don’t wait, and don’t mess around with lesser models. Pick the best - and easily accessible model: that’s GPT4 if you’re on Microsoft, PaLM 2 on Google, and probably Claude 2 on Amazon. But you really should ask yourself whether it would be better to use GPT4, and know you are using the very best all-purpose model. Happy innovating!

Kerry Robinson is an Oxford physicist with a Master's in Artificial Intelligence. Kerry is a technologist, scientist, and lover of data with over 20 years of experience in conversational AI. He combines business, customer experience, and technical expertise to deliver IVR, voice, and chatbot strategy and keep Waterfield Tech buzzing.

Subscribe to Kerry's Weekly AI Insights

Alberto Otazua

Helping Brands Transform Customer Engagement | Client Director at LivePerson | CX Innovator

1 年

Thanks for your insights Kerry Robinson. Generative AI is advancing so quickly that it's great for you to break it down for us.

Insight of the Week: It's GPT4 or the Highway!

Waterfield Tech

领英推荐

Waterfield Tech的更多文章

社区洞察

其他会员也浏览了

What is a Claude 3.5 Sonnet, and how does it compare to Gemini-1.5 Pro and GPT-4o?

Generative AI News - May 2024

What is DeepSeek? Understanding the Impact of This Game-Changing AI Tool

Claude 3: A First Look at this Exciting New Technology

When AI Learns to Think: The Game-Changing Leap of LINC

Declutter AI: things that matter!

DOT Europe insights on Gen AI

AI and Your Business

A note to the staff about AI

Geneea's AI Spotlight #3

领英推荐

Waterfield Tech的更多文章

Thriving in the Intelligence Age

AI speaks truth to power

Which Customer Service AI platform?

10x knowledge work

Insight of the Week: What DeepSeek R1 means for business

Insight of the Week: Is AI just another knowledgebase?

Insight of the Week: Preparing for College in the Age of AI

Insight of the Week: The Power of Taste in the Age of AI

Insight of the Week: The AI Playbook for 2-3x Revenue

Insight of the Week: o1.. no o3... is amazing ?? and scary ??

社区洞察

其他会员也浏览了

What is a Claude 3.5 Sonnet, and how does it compare to Gemini-1.5 Pro and GPT-4o?

Generative AI News - May 2024

What is DeepSeek? Understanding the Impact of This Game-Changing AI Tool

Claude 3: A First Look at this Exciting New Technology

When AI Learns to Think: The Game-Changing Leap of LINC

Declutter AI: things that matter!

DOT Europe insights on Gen AI

AI and Your Business

A note to the staff about AI

Geneea's AI Spotlight #3