Why ChatGPT-4o Might Just Make Your Custom AI Obsolete
LinkedIn just created this image for me, then chose to redact the text all on its own.

Why ChatGPT-4o Might Just Make Your Custom AI Obsolete

While everyone's been busy obsessing over when ChatGPT-5 might finally arrive, I've been spending some quality time with its young stepbrother, ChatGPT-4o.

I feel like because OpenAI’s big rollout the other day wasn’t the big ChatGPT-5 launch, a lot of people kind of passed over it.

But from my early testing, ChatGPT-4o isn't just another incremental improvement.

It is a truly multimodal model that integrates text, voice, and vision capabilities. It offers faster response times, enhanced performance, and supports over 50 languages. Key features include real-time emotion detection, voice and facial recognition, and versatile response styles. There’s also a new desktop app for macOS and an API for developers. This model is more cost-effective and aims to make AI interactions more seamless and accessible.

Check out some of the demo videos if you haven’t been paying attention.

And after a solid 30 hours of putting this AI through its paces, I'm starting to think that all the custom AI models I've built might be headed for the scrap heap.

I’ve been working with generative AI for several months to analyze financial statements, SEC filings of public companies, legal documents like purchase agreements, and a lot of general ledger and other financial data. I’ve been impressed with the results for the most part, but ChatGPT-4o took this to the next level.

To test ChatGPT-4o's capabilities, I presented it with a complex scenario analysis assignment. I provided the model with three years of historical financial data, gave it a couple of prompts, and then sat back and watched as it effortlessly generated a baseline forecast using the Seasonal Autoregressive Integrated Moving Average (SARIMA) method – in one shot! Then I instructed ChatGPT-4o to create three distinct scenarios based on price volatility, government regulations, and consumer demand. Impressively, the model produced mild, moderate, and severe versions of each scenario, while maintaining them in memory for seamless comparison.

Next, I challenged the model to run multiple Monte Carlo simulations to rigorously stress test the scenarios.

I’ve tried this several times, but before 4o, I haven’t been able to get any prior models to do this correctly. But with 4o, the results were remarkable. It demonstrated superior coding capabilities, increased computational power, and exceptional reasoning skills that surpassed other AI models I’ve used.

Some early benchmarking results have shown that ChatGPT-4o does pretty well against the other leading frontier models:

  • ChatGPT-4o performs impressively in mathematical computations, especially when using chain-of-thought prompting.
  • The model demonstrated high accuracy, often matching or surpassing the capabilities of Claude 3 Opus.
  • In tests involving complex mathematical problems and logical reasoning, GPT-4o outperformed Claude 3 Opus in several instances, showcasing its advanced computational abilities and improved prompt understanding
  • OpenAI’s latest model also exhibited superior instruction following abilities compared to Gemini 1.5 Pro and showcased advanced coding skills that outshone Google’s flagship.

The rapid advancements in foundation models like ChatGPT-4o have lent credence to Sam Altman's assertions regarding the future of AI. As the CEO of OpenAI, Altman has long maintained that as AI systems approach Artificial General Intelligence (AGI), general-purpose models will exhibit a greater capacity to handle a wide array of tasks with heightened accuracy and efficiency, surpassing the capabilities of narrowly focused models. He argues that the versatility and adaptability of these advanced models will ultimately prove more beneficial across diverse domains.

This raises questions about the long-term viability of specialized AI models like BloombergGPT.?

Remember BloombergGPT?

Developed over a year ago, BloombergGPT is a large language model designed specifically to handle financial data and tasks. The project required a substantial computational investment, with approximately 1.3 million hours of GPU time to train on a proprietary dataset comprising more than 350 billion tokens, supplemented by a nearly equal amount from general datasets.

While BloombergGPT excels at specialized financial tasks such as sentiment analysis and named entity recognition, and would likely outperform even the latest foundation models in niche financial analysis, the sustainability of its competitive edge remains uncertain. Considering Altman's perspective on the potential of massive, multi-modal models, it is worth contemplating whether the substantial cost and complexity associated with developing models like BloombergGPT are justified for broader use cases, particularly for organizations outside of major tech giants.

With the way foundation models like ChatGPT-4o are advancing, I can't help but wonder how long those specialized models will be able to keep up.

I'll be the first to admit that I've poured a lot of time and effort into building custom AI models for finance. Just a few short months ago I excitedly rolled out whole website that hosts an army of my custom finance and accounting bots.

But now, with ChatGPT-4o in the picture, I'm starting to wonder if all that work was really necessary.

Don't get me wrong, there are still some cases where a specialized AI might be the way to go. If you're working with super niche or proprietary data, for example, you might need to bust out the fine-tuning and RAG techniques. But for most professional applications? I'm not so sure.

At the end of the day, if you've got an AI that can outperform the specialists in just about every domain, what's the point of building a bunch of custom models? It's like having a Swiss Army knife that can do the job of a whole toolbox. Sure, you might need a specialized tool every now and then, but for the most part, that trusty all-purpose gadget has got you covered.

So, if you're curious about what ChatGPT-4o can do, I say give it a whirl. Grab some financial statements, maybe a few SEC filings, and let it loose on the same questions you'd normally ask your financial analyst. I think you might be surprised at just how much this "generalist" AI can accomplish.

Now, I'm not saying that specialized AI is going to disappear overnight. There's still a place for it in certain scenarios. But with the rise of foundation models like ChatGPT-4o, it's getting harder and harder to justify the time and resources required to build and maintain a bunch of custom models.

So, while everyone else is busy hyping up the next big thing in AI, I'll be over here watching ChatGPT-4o (and whatever comes next) move forward without requiring anything from us, and (I guess) trying to figure out how I'm going to stay busy in the post AGI future.

Ingo Krogmann

Diplom-Kaufmann (FH)

4 个月

I agree it is impressive. Giving chatgpt some annual reports asking for insights and more impressive testing correlations with price curves it can give you a forecast a whole company works on for month. Can’t imagine giving the model a full set of data and not only data from open sources.

Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

4 个月

Mind-blowing stats. No wonder finance wizards rave about GPT-4o's brilliant complexity handling. Glenn Hopper

回复
Nidhi Pandey

Founder, CMO?? Helping Accounting Firms build growth marketing systems & thought leadership personal brands that get consistent leads

4 个月

Super insightful Glenn Hopper Interesting how the 4o version is able to solve the complex financial scenarios, however, it needs to be given the right prompt/command to perform the tasks which many of them struggle with. Would love to learn from you about the right way of giving it the prompts (what details to give and how to frame the prompt the right way to get the desired output)

回复

Insightful!! The ability to generate and stress test complex scenarios can truly transform financial planning and analysis. Sounds like the 4o version is pretty cool.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了