A Deep Dive into AI Deep Research
Michael Todasco
Visiting Fellow at the James Silberrad Brown Center for Artificial Intelligence at SDSU, AI Writer/Advisor
Here’s the link to a NotebookLM for this article if you prefer to listen to the AI podcast version.
Jurassic Spark: Crichton’s Cognitive Bias
In 2002, Michael Crichton (known for writing Jurassic Park, among many other novels) gave a talk on forecasting and speculating at the International Leadership Forum. In that speech, he coined a cognitive bias well known in popular psychology as the Murray Gell-Mann Amnesia effect.
What is it? Jump into a thought experiment with me. Imagine you are reading an article about something you know well. Like you’re the world’s expert in it. You read the article, likely written by a generalist journalist or an amateur blogger, and you notice that it may have many things that aren’t quite right, inaccuracies, or outright errors. It makes sense that it would be like that, as no journalist would have the same level of expertise as you. Later, you then read something else about a subject where you don’t have great expertise. You forget about potential inaccuracies and assume the article to be entirely accurate. The fact that we forget this when approached with subject matter where we don’t have much knowledge is the “amnesia” part of the Murray Gell-Man Amnesia effect.
So, how do we apply this to AI? Ask an AI about a topic you're deeply familiar with, and you'll quickly notice gaps or errors in its reasoning. These are the hallucinations we all know of. And having that cautiousness is a good thing.
Fast forward to December 2024, when Google quietly announced a “Deep Research” feature in Gemini that could analyze dozens of websites and conduct “hours of research in minutes.” In January, DeepSeek launched a similar product, R1 (and US stock markets collapsed). Within a few weeks, OpenAI released ChatGPT 03-mini with Deep Research, x.AI launched Grok 3 with Deep Search, and Perplexity released their own Deep Research. Phew! The race for these Deep Research products was on.
How Deep Research Works
What makes these "Deep Research" tools different from regular AI chatbots? Think of it like the difference between a student answering a question off the top of their head versus one who gets to think about it, research at the library, meticulously analyze sources, and synthesize findings for a week.
Traditional LLM responses are one-shot, meaning the AI makes its best guess based solely on what it learned during training. Deep Research tools, however, follow a multi-step process:
One thing appears true in models: the more time they are given to think and if they are given the ability to check their work, the more accurate they become. Are we at the point where we don’t need to worry about hallucinations? Can we simply have our Deep Research tool generate a report for your boss on zoning regulations in the Twin Cities, hit “send,” and you take the rest of the week off? These reports seem impressive based on what I’ve been using them for. But are they?
The one way to look at this is to apply the Murray Gell-Mann Amnesia effect but in reverse. I need to ask them about something I know a lot about and see how well it covers it. And I wanted to ask about the one thing that we are all experts in:
“Me!”
You know more about yourself than anyone else does or can. But how can I use that to put these models to the test? Now, I’m not a public figure, so if I asked them to “build the Todasco family tree,” it would be a useless endeavor. That information isn’t out on the open web. But there is one thing I have in my past that is 100% on the open web, unique (because, luckily, there aren’t any other “Michael Todascos” in the world), and quantifiable.
Patently Curious
In my time at PayPal, I had a bit of a side hobby filing patents for the company (yes, I’m a nerd). That became a big part of my Innovation gig later on. With that, I was granted a lot of patents. All this data is publicly available. What a perfect question for the Deep Research Tools and a perfect little experiment.
I asked the same question to five deep research tools:
How many granted patents does Michael Todasco have? Sometimes listed as Michael Charles Todasco. All done at PayPal or eBay
And here are the results with links to all the reports if you want to go… deeper.
Google Gemini 1.5 Pro with Deep Research
This report was a mess. It wasn’t clear in its conclusions. It said I either had 2 or over 100 patents; the latter is sourced from me quoting that number. If I started saying I had a thousand or a million, would it believe it? Possibly. In total, Gemini produced 932 words of wrong information and contradictions.
Grade D-
DeepSeek R1
At least this one only took 32 seconds to get me the wrong answer. It said 8. It’s wrong. But at least it was quick to get the wrong answer.
Grade F
ChatGPT 4.5 with Deep Research
Initially, this one, too, relied on my proclamations of patent filings instead of relying on source material. But it then pulled some of the actual filings and extrapolated on them. It smartly categorized some of the filings but ultimately never gave an answer. Of all of them, it was probably the nicest formatted report- a real consultant’s dream!
Grade C-
x.AI Grok 3 DeepSearch
Grok spent over 2 minutes to tell me the answer was 2. But it did nicely analyze those two!
Grade D
Perplexity Deep Research
Perplexity came back and told me the answer was 19. It nicely laid some of these out and even grouped them to reflect the evolution of my career. Perplexity was by far the most flattering of all of them, as it entirely made things up—like really, really made-up stuff.
In the 2023 PayPal annual report (mind you, I left the company in 2022), it said that PayPal saved $2.3 billion in fraud costs from my patents alone. (PayPal reported $1.7 billion in fraud and credit losses in 2023. I guess it would have really been a disaster without “my savings.”) ?If you’re unfamiliar with fraud, patents, or SEC filings, you’ll just have to trust me- that is one of the most bonkers things they could have said. But now that I published it on Perplexity, LLMs and web crawlers surely gobble it up, and who knows, maybe there will start to be a consensus on how I continued to save billions of dollars for a company long after I left.
Grade F-
So What Is the Real Answer?
AI wasn’t totally useless; I was able to use Claude 3.7 Sonnet to build some old-fashioned software that scraped the USPTO website and counted up all the “Michael Todascos.” I went back and forth with it a few times, and it got to the final answer, 108. (All the code is here if anyone wants to play with it.)
The Takeaway
If I hadn't known anything about Michael Todasco, I would have believed the Deep Research reports. The answers sound very credible. Does this mean that Deep Research tools aren’t helpful in any circumstances? Of course not. But don't have it run a report and mindlessly drop it off at your boss' or teacher's desk. Just because it "thinks" for a long time doesn't mean it's right.
One Final Experiment: The Ultimate Test
I wanted to give the Deep Research tools one last try. What is the biggest promise of AI? Being able to think of things well beyond our human comprehension. To be super-human in thought. So, I asked ChatGPT’s Deep Research to find a scientific truth that we all believe today that is wrong. (What’s the modern-day version of scientists who believe the Earth is at the center of the solar system?) Could it take all the world’s data and reason through it to come up with a credible solution? Could it find a breakthrough that has been evading scientists and slowing human progress?
Deep Research spent 5 ? minutes thinking and returned a beautiful, 5000-word answer on dark matter, quantum mechanics in biology, and many other things beyond my comprehension. For me, a non-scientist, it sounded quite impressive. Maybe it's good enough that I’d send this report to get published in Nature.
Or, maybe it's stringing together impressive-sounding technical jargon in a way that's convincing to a physics novice like me, creating the perfect cognitive trap. Just because you understand the Murray Gell-Mann Amnesia effect doesn't mean you're immune to it when faced with AI-generated content.
Leader Product Management, Cloud AI at Google, ex-PayPal, board member.
13 小时前I always love your ‘real life’ takes on these tools! Keep it up.