Insight of the Week: It's GPT4 or the Highway!

Insight of the Week: It's GPT4 or the Highway!

By Kerry Robinson


So you read The Procrastinator's Guide to AI?and you're ready to get started.

?

But which AI model should you use? GPT3.5? GPT4? What about Google’s PaLM? Or an open-source model like LLaMA 2?

?

I think that's easy to answer: "It's GPT-4 or the highway!"

?

Why? Because GPT4 is soooo much better than all the rest. Just look at this comparison of GPT4 vs 3.5 and the best of the rest at the time of the launch of GPT4 (referred to as LM SOTA, or state of the art, which was Google’s PaLM and Meta’s LLamA). I also included the top-performing open-source LLM, for comparison. All are against standardized benchmarks of AI capabilities, including language understanding (MMLU), common sense (HellaSwag), and reasoning (ARC):


GPT4 is so far ahead. It's almost a joke. And while it's expensive, and a little slow, that's no reason to bother with another model. The race to leverage AI has already started. Your competitors are working out how to use it to beat you and new entrants are figuring out how to completely disrupt your market, so you can't afford to mess around tweaking and tuning inferior models.

?

That said, Google recently launched v2 of their PaLM model, and while they report on different benchmarks, so it’s hard to draw a direct comparison to this data, on HellaSwag (Common sense) it’s up with GPT3.5. But that’s still a long way from GPT4. So again, why would you bother using a lesser model, that might constrain your innovation, and slow your speed to market?

?

Now, there's a possible caveat. Anthropic launched Claude 2 a few months back, and while I can't find any directly comparable benchmarks, there are anecdotal reports of it beating GPT4 in certain logic, coding and text generation tasks. I tend to use Claude for processing text due to its longer 'context window' - you can stuff 100 thousand tokens (around 65k words) into it and it won't choke. GPT4 is limited to about a 3rd of that. The other advantage of Claude is that it's available on Amazon, which is a no-go area for the OpenAI models, presumably due to the tie up with Microsoft.

?

But I think the principle is clear: don’t wait, and don’t mess around with lesser models. Pick the best - and easily accessible model: that’s GPT4 if you’re on Microsoft, PaLM 2 on Google, and probably Claude 2 on Amazon. But you really should ask yourself whether it would be better to use GPT4, and know you are using the very best all-purpose model. Happy innovating!


Kerry Robinson is an Oxford physicist with a Master's in Artificial Intelligence. Kerry is a technologist, scientist, and lover of data with over 20 years of experience in conversational AI. He combines business, customer experience, and technical expertise to deliver IVR, voice, and chatbot strategy and keep Waterfield Tech buzzing.

Subscribe to Kerry's Weekly AI Insights

Alberto Otazua

Helping Brands Transform Customer Engagement | Client Director at LivePerson | CX Innovator

1 年

Thanks for your insights Kerry Robinson. Generative AI is advancing so quickly that it's great for you to break it down for us.

回复

要查看或添加评论,请登录

Waterfield Tech的更多文章

社区洞察

其他会员也浏览了