In case anyone's interested in some quick internal benchmarking of GPT-4o for legal tasks: in early results, GPT 4o does seem to be (at least) as accurate as GPT 4 Turbo, and it is much, much faster.
I'm sure everyone saw that OpenAI announced their latest model, GPT-4o, yesterday - touting it as "as smart as GPT-4-Turbo and much more efficient." After some initial testing on legal tasks, those claims appear to be justified.
First, GPT-4o is FAST.? Like, damn fast. GPT-4o appears to be at least twice as fast as GPT-4-Turbo and Claude Opus, and even significantly faster than Claude Sonnet. It still trails Claude Haiku in my testing, but with far more usable results. (Poor Haiku can be pretty dim-witted on all but the most basic tasks).
More impressively, though, GPT-4o also seems to be (at least) as accurate as GPT-4-Turbo - in fact, in some of my early tests it outperformed both GPT-4-Turbo and (more surprisingly, in my mind) Claude on some of the more difficult analysis tasks.
I'll admit, this was surprising to me: I would have thought those sweet speed increases would have resulted in at least some drop off in quality. Very impressive results, all in all.
I would note these initial results are from a fairly small dataset run (basic issue analysis tasks across 15 COIs - if anyone would like to know more, feel free to drop me a line).
I'd also note that - consistent with the "friendly" introduction demos by OpenAI - GPT-4o seems to often give more extraneous, "chatty" output ("This should be an accurate and complete summary based on the document provided. Let me know if you have any other questions!"). This kind of output is pretty typical in ChatGPT, but in my experience it's less common when accessing?models through the API. For use cases where this might be an issue, you'll need to do some extra prompt work to address this (and doing so could affect overall accuracy to some extent, although probably not significantly).
Would love to hear everyone's first impressions.