What is Deep Research AI, and which is Best? Navigating ChatGPT, Google Gemini, Perplexity and X.AI offerings.
Trent Gillespie
AI, Innovation and Growth Keynote Speaker | CEO at Stellis.AI | ex-Amazon | AI-SPRINT Newsletter and Framework
Each week in the AI SPRINT Newsletter, I dive into topics that matter to business leaders and professionals around the subject of AI Adoption in business. This week it's all about Deep Research, and I'm posting on LinkedIn to test this format. Check out more about me at https://trentgillespie.live.
Most people are now familiar with language model-based AI like OpenAI’s GPT-4o or Google’s Gemini. You provide input, the AI interprets your intent using natural language processing, and generates a response based on its training data. These systems excel at handling high-level concepts and widely available information, but their performance deteriorates when addressing niche or lesser-known topics. Researchers have attempted to mitigate this through methods like fine-tuning and multi-shot prompting, yet these approaches remain limited by the one-to-one question-answer framework.
The Emergence of Reasoning AI
To overcome those limitations, in September 2024, OpenAI was the first to announce a “reasoning” language model (called “o1”). Unlike conventional AI models, reasoning models work by breaking down a user's question into smaller parts, planning how to answer it, and then refining their response step-by-step. This process—sometimes called a "chain-of-thought"—helps ensure the answer is clear and on target.
All of that takes time, and when you work with a reasoning model, you’ll see that they will “think”, which you might be able to monitor in real-time to understand their approach. Reasoning models are essentially more mature versions of the AI models you have, built to more accurately follow user intent and provide useful responses. OpenAI has the lead in Reasoning AIs today, improving on their o1 model with their new o3-mini and o3-mini-high models launched in January.
Although these provide significantly better results on most research-related uses than non-reasoning models, they still suffer from hallucinations, as with the exception of Perplexity, the models do not validate facts against online sources, still relying on their training data.
The Rise of Deep Research Models
That’s where Deep Research comes in. Alongside the release of OpenAI’s o3 Reasoning model in January, they introduced Deep Research—a version specifically designed to mitigate hallucinations by incorporating real-time internet validation. Built on the o3 reasoning framework, Deep Research goes beyond existing training data by actively searching for additional information online, cross-checking facts, and providing human-checkable citations. This results in a more research-oriented response style that enhances accuracy and reliability.
The other language model providers have followed quickly, with Perplexity releasing their Deep Research model last week, and Elon Musk’s X.AI releasing their competitor yesterday, Grok3.
Each Deep Research model significantly outperforms all prior models, and the results are dramatic: OpenAI’s model outperformed its own previous best-performing model by nearly tripling its score on the "Humanity’s Last Exam" benchmark (from 9.07% to 26.6%). Perplexity’s model achieved 20.5%, while Gemini fell significantly behind at 7.2%. ?Data is not yet published for X.AI’s performance on that benchmark.
Due to the additional steps to iteratively gather and check their work, Reasoning and Deep Research models generally take much longer to provide a response. For example, I asked each OpenAI model to help me plan a weeklong trip to Chicago. GPT-4o provided an answer immediately. o3-mini provided a response in 27 seconds, and Deep Research took 8 minutes. Quality increased with each one, with the Deep Research response being an excellent, tailor-made vacation itinerary.
Reasoning and Deep Research models are quite impressive: the leap in quality and usefulness is as big as the ChatGPT launch just over 2 years ago. ?This is a generational shift in AI.
Comparing the Top Models
There are now four Reasoning and Deep Research products on the market, from OpenAI, Google, Perplexity and X.AI. Although OpenAI’s product is the hands-down winner, each has different capabilities, and ultimately which one is best for you is the one you are willing to pay for.
Here’s a brief overview of each and their capabilities today. Afterward, I’ll go into detail on each, and finish with cautions and tips for beginning your use.
Perplexity:
Gemini Deep Research:
OpenAI ChatGPT:
X.AI’s Grok3:
Which works best, and which should you use?
I tested each one out across three different use cases: planning a personal vacation to Chicago, scientific research for a client, and competitor business research. You can check out and compare the results for the first two here—just click on the links to see the results (I am keeping the business research private for my own purposes). Total time each query took and number of checked sources listed for comparison.
Family Trip to Chicago
Scientific Research
What is clear from the table is that Grok3 is dramatically faster than any other Deep Research model—27 seconds vs ChatGPT’s 7 minutes. On the surface that looks amazing, but so far the results seem sub-par in comparison—much less depth, and more of key-points to explain concepts at a high-level. ?
Is your company struggling with AI Adoption or Strategy?
Let me help! I can get your company from zero to AI-enabled in as little as 30-days, with leadership and staff AI workshops, keynote talks, online education, and advisory, all following my AI SPRINT? adoption methodology and inspired by Amazon best practices.
Just reply here and we can arrange a short meeting to explore how to advance your AI efforts.
ChatGPT Dominates in Depth and Accuracy
To get it out of the way: ChatGPT is definitely the best today for reasoning and deep research. It has the most flexibility with different models, the deepest answers, the most useful results, and the best-written content, all within the same UI. It truly shows the power of AI, and what will be available to most people as costs continue to go down.
What sets it apart:
Challenges:
领英推荐
Google’s Gemini: The Best Value Option for General Research
I think Google’s Gemini has the best generalized Reasoning and Deep Research tool, available at a totally reasonable cost. If you have only $20 a month, this is the one to go for.
What sets it apart:
Challenges:
Perplexity: The Cheapest Choice
With Deep Research, Perplexity continues their disruptor mentality, by giving out free use of their deep research capability to anyone. I love the approach, but that comes at a cost, and to manage it, they seem to have cut down on iterations and verbosity: results are uniformly bullet points and it is not as exhaustive at research as the other models. Also, I suggest you avoid their reasoning model based on DeepSeek R1.
Use it if you don’t have access to any other deep research tool or are experimenting.
What sets it apart:
Challenges:
X.AI’s Grok3: Unproven AI with Baggage
Grok3 has been out for only a day, so the jury is still out. Although it was promoted as breaking several benchmarks for math and science, those benchmarks were self-reported, and real-world testing is what matters. There's been many negative reports [more, more]. With X.AI doubling the price to get access, now $40 a month, and others are seeing lower performance in testing.
My own testing? The model works, but is suspiciously fast, and doesn’t go into enough detail to be “deep research”. It has a unique approach, in that it seems to give a bullet point summary to start, then much more detail on how it came up with its response. It also seems to look for “surprising facts”, which might be useful, but maybe not.
Combined with the political environment around Elon Musk, the $40 a month price-point, I think there are better models and would put X.AI at the bottom of the list.
What sets it apart:
Challenges:
Final Verdict for Which AI Deep Research is Best?
Here it is:
But this may change quickly: OpenAI has announced they’ll integrate Deep Research and make all model selection much easier within Q1, add charts, graphs, and images, and other improvements, likely making that platform stay as the front-runner. I also expect their price will come down as the others are strong contenders. ?Google will likely release a new version of Gemini soon, and I bet we’ll see models from Meta and Amazon shortly.
But Remember, They Aren’t Perfect
These models are groundbreaking, but they’re still experimental, evolving rapidly with plenty of kinks to work out before becoming truly mainstream. Here are some key limitations to keep in mind:
Pro Tips for Using AI Research Models
Finally, here are some important best practices to know when working with reasoning models. With these, you’ll be up and running as an AI-empowered research pro in no time!
If you want more content like this, or want to learn about adopting AI in your company, sign-up for my AI SPRINT newsletter at https://ai-sprint.beehiiv.com
A bit about the author:
Trent Gillespie is a leading expert on AI-driven business transformation and the creator of The AI Flywheel? and AI SPRINT?, powerful frameworks for building high-performance companies in the AI era. With over 25 years in tech leadership—including nearly a decade at Amazon spearheading its global expansion, Last Mile innovation, and Alexa AI privacy efforts—he’s seen firsthand how AI is reshaping industries, who will be left behind, and how businesses can harness AI for real growth and innovation.
Now, as CEO of Stellis AI, Trent helps businesses cut through the AI hype and turn themselves into AI-powered growth engines. His AI Flywheel and AI SPRINT frameworks provide leaders with a repeatable, systematic approach to scaling innovation, driving prosperity, and staying competitive in an AI-driven world.
Through his advisory, leader workshops, and high-energy keynotes, Trent delivers an independent, no-BS roadmap to help leaders unleash AI’s full potential, create self-sustaining innovation, and take back the means of innovation—ensuring that businesses of all sizes can compete, thrive, and shape the future. Learn more about Trent at https://trentgillespie.live
President and Co-Founder @ Quadrant Technologies | Elevating businesses with the best in-class Cloud, Data & Gen AI services | Investor | Philanthropist
4 周These new AIs are next-level they break down complex tasks, use real-time data, and even cite sources. Testing them on things like trips, research, and competitor analysis sounds super practical.
Founder & CMO @ Oblong Pixel | Startup Advisor | Fractional CMO
4 周Really great evaluation! One note is for DeepSeek R1: Perplexity open sourced a version that they finetuned to remove the bias and censorship without impacting the reasoning capabilities. So theoretically that shouldn’t be as much of a concern.