Generating Novel Research Ideas Using LLMs
AIM Events
Hosting the World’s Most Impactful AI Conferences & Events. For Brand collaborations write to [email protected]
Stanford University recently published a paper titled ‘Can LLMs Generate Novel Research Ideas?’ The study found that ideas generated by LLMs were rated significantly more novel than those from human experts.?
Novel ideas??
To reach this conclusion, over 100 NLP researchers were asked to come up with new ideas and review both LLM- and human-generated ideas without knowing their source. The results showed that LLM ideas were considered more innovative (with statistical significance, p < 0.05), although they were rated slightly lower in terms of feasibility.
The approach is similar to that of the Japanese AI startup, Sakana AI’s AI Scientist, which automates the entire research lifecycle. It generates novel research ideas, writes necessary code, executes experiments, summarises results, visualises data, and presents its findings in a complete scientific manuscript.
Interestingly, the startup claimed that each idea is implemented and developed into a full paper at approximately $15 per paper.
Generating new ideas is relatively easy for LLMs, thanks to their extensive training on large datasets and ability to combine various concepts. However, they continue to face challenges with advanced reasoning.?
Meanwhile, OpenAI is preparing to release its new model, Strawberry, which is expected to offer improved reasoning capabilities.
Chai Discovery, a biology startup founded by a former OpenAI employee, recently introduced Chai-1, an advanced foundation model that predicts molecular structures crucial for drug discovery. Innovations like these show that LLMs are close to driving significant research breakthroughs.
“The ability of LLMs to combine concepts from vast datasets in ways not typically thought of by humans can lead to ideas that are considered more novel. This might be because LLMs aren’t constrained by the same cognitive biases or conventional thinking patterns that humans have,” said DigitalVibes.ai founder Anthony Scaffeo.?
He added that LLMs can make connections across different fields or unrelated data points, which might not be intuitive or immediately obvious to human experts.
Another startup, EvolutionaryScale, backed by Amazon and NVIDIA, is using LLM-based models like ESM3 to develop novel proteins for scientific research, aiming to revolutionise drug discovery and materials science through AI-driven protein engineering.
The naysayers?
“My student’s comment on the paper about LLMs generating more novel research ideas than humans is making the rounds. I think this says more about NLP researchers than about LLMs. Ouch,” joked Subbarao Kambhampati, professor of computer science at Arizona State University.
“I am not gonna let no LLM beat me in generating novel NLP research ideas,” he quipped. Interestingly, Kambhampati has been quite vocal about LLMs being bad at reasoning and planning.??
He said that models like GPTs 3, 3.5, and 4 are poor at planning and reasoning, which he believes involves time and action. According to him, these models struggle with transitive and deductive closure, with the latter involving the more complex task of deducing new facts from the existing ones.
Today researchers have not experimented much with LLMs to generate novel ideas, instead they have been predominantly using it to review research papers.
Apparently, Meta AI chief Yann LeCun argues that while LLMs cannot reason and plan, they are still a good tool for reviewing papers. “[Human] reviewers should be able to use the tools they want to help them write reviews. The quality of their reviews should be assessed based on the result, not the process,” he said.
Meta AI launched Galactica, an LLM for research, in November 2022, just weeks before ChatGPT. However, it was taken down after three days due to criticism over generating misleading or offensive information. LeCun remains unhappy about it to this day.?
However, not everyone agrees with LeCun.
“AI-generated reviews of scientific papers are increasing, vacuous, and need to be stopped quickly. They reduce the author’s trust in the review process. Proposal: someone who is judged to have submitted such a review is banned from submitting to the same conference/journal for two years,” said Micheal Black, director, Max Planck Institute for Intelligent Systems.
领英推荐
Adding to this perspective, Mukur Gupta, an applied scientist at Apple, recounts his frustrating experience with an LLM-generated review.
“I love AI as an assistant. But after getting an LLM-generated review for my NeurIPS paper last month (which was total crap and useless), I’m a little sceptical about AI discovering true novelty,” said Gupta.?
He explained that LLMs could be a game-changer for interdisciplinary research or for uncovering new problems in fields where human experts, limited by their working memory and attention span, may struggle to grasp more than a handful of domains.?
“LLMs, with their ever-expanding knowledge base, offer the potential for cross-pollination of ideas. But when it comes to deep, niche, and fundamental breakthroughs, I’m not buying it—hence my disappointment with that NeurIPS review,” he added.
Lately, there has been a growing trend of researchers using LLMs to write papers. According to recent data, the use of the term ‘delve’ in the abstracts gradually increased through 2022, jumped noticeably in 2023 (when ChatGPT became widely available), and has continued to rise in 2024.
The future of research should be a collaboration between humans and LLMs to generate truly innovative ideas. According to Stanford’s paper, human ideas often prioritise feasibility and effectiveness over novelty and excitement, which can limit their creativity.?
On the other hand, LLMs struggle to judge the quality of ideas. By combining the strengths of both humans and LLMs, we can pave the way for exciting research.
Enjoy the full story here.?
AI Events?
NVIDIA and AIM Announce DevPalooza 4.0?
What are you waiting for? Register now!?
Why Cursor is Ahead of the Curve
A research paper on GitHub Copilot and GPT 3.5’s productivity titled, ‘The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers’ came out earlier this week. It showed that efficiency among developers grew by 26% while the number of code completions increased by 38%. Read on.?
AI Bytes?