How Close is Artificial General Intelligence in 2024?

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

发布日期: 2024年1月25日

+ 关注

Hey Everyone,

This is a guest post by Conrad Gray who has a rather impressive Newsletter called Humanity Redefined.

In recent times OpenAI, Anthropic, Google DeepMind and even Meta’s FAIR (Meta’s Fundamental AI Research team) have made the discovery of an artificial general intelligence their goal and a significant priority.

“Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of?humanity.” - OpenAI

Humanity Redefined

Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

By Conrad Gray

Subscribe to HMR

“Our long-term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit. We’re bringing our two major AI research efforts (FAIR and GenAI) closer together to support this,” Mark Zuckerberg, Meta

I’m inviting select writers to help us unpack the development towards AGI in a series of articles.

“We’ve come to this view that, in order to build the products that we want to build, we need to build for general intelligence,” Mark Zuckerberg (Meta)

Joe Fuqua 5 年前

Navigating the pros and cons of Artificial…

Avi Gella 10 个月前

Emotional Intelligence in the Era of Artificial…

Dipes Biswas, EPGDM (IIMK), PMP, CSM, SASM 7 年前

For those reasons, we can expect to see more research done to improve the performance of language models after the training. One such technique is fine-tuning, in which the model is trained to be specialised in certain domains, like legal or medical domains. A properly fine-tuned model can outperform larger, general language models in that specific domain.

Another technique to improve the reasoning capabilities of large language models is Chain of Though prompting - instead of just asking a question to the language model, it is more effective to ask the model to explain step by step how to solve the problem before giving the answer. It is a simple but very effective technique to dramatically improve model’s performance. This approach was later improved with Chain of Thought prompting with Self Consistency. In this approach, the model is asked to generate multiple answers to the same question. The most common answer then becomes the final answer given to the user.

Researchers from DeepMind then took this concept one step further and instead of chains of thoughts they used trees. The Tree of Thoughts approach begins with several initial thoughts and explores where these thoughts lead the AI agent. ToT allows thoughts to branch off and explore other possibilities.

Source: Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Another way to improve the performance of language models is to use different architectures. A good example of how a well-executed architecture can lead to massive improvements in performance is the recently released Mixtral model from Mistral AI.

Mixtral is what is known as the Mixture of Experts model. Instead of one giant, monolithic model, Mixtral consists of seven smaller Mistral 7B models working together. The result is that Mixtral is the best open-source model available right now, being only outperformed by GPT-4 and slightly by GPT-3.5, according to benchmarks published by HuggingFace. It’s also worth noting that the current understanding is that GPT-4 itself is a Mixture of Experts model, consisting of eight 220B models.

How DeepMind and OpenAI approach the problem of deep reasoning

All the techniques I mentioned above - from Chain of Thought and Tree of Thought to new architectures - can improve the reasoning capabilities of language models and therefore improve their performance. They might be good enough for some tasks but I don’t think they will be enough to achieve AGI.

But they hint at the next breakthrough that can elevate language models to the next level - deep reasoning. Here, a model autonomously determines how to solve a problem given only a prompt. We can see hints of a possible solution in DeepMind's recently announced AlphaCode 2 and from the results the OpenAI team shared in the Let’s Verify Step by Step paper in mid-2023.

Let’s start with AlphaCode 2. AlphaCode 2 is an improved version of AlphaCode, the first AI model to reach a competitive level in programming competitions. Released almost exactly a year after its predecessor, AlphaCode 2 combines advanced language models with search and re-ranking mechanisms. When evaluated on the same platform as the original AlphaCode, AlphaCode 2 solved 1.7 times more problems, and performed better than 85% of human competitors.

How AlphaCode 2 compares to AlphaCode and human coders. Source: AlphaCode 2 Technical Report

To test how good AlphaCode 2 is, researchers gave the model some challenges from Codeforces, a platform for competitive programmers full of challenging coding problems.

AlphaCode 2 isn't just a single model; it's a suite of models based on Gemini, Google’s state-of-the-art AI model. Researchers at DeepMind took several Gemini Pro models and fine-tuned them to generate code. Each model was also tweaked to maximise the diversity of generated code.

These models generated up to a million code samples for each coding puzzle. Although not every sample is correct, the sheer volume increases the likelihood that at least some of them will be correct. The samples that do not compile or that do not produce the expected results are filtered out (according to DeepMind, this step removes 95% of the samples).

The remaining samples are clustered based on output similarity, leaving no more than ten for a scoring model to evaluate. The scoring model, also based on Gemini Pro, assigns scores to each solution, with the highest-scoring one selected as the final answer.

The impression I get after reading the AlphaCode 2 paper is that at the moment, AlphaCode 2 feels like a brute-force approach powered by sophisticated language models. The system generates up to a million samples, hoping at least one is correct. The results are undeniably impressive but I think DeepMind can do better.

One way of improving AlphaCode 2 could be to use Gemini Ultra, the most capable model from the Gemini family of models. Gemini Pro, which DeepMind used for AlphaCode 2, is roughly an equivalent of GPT-3.5, which powers the free version of ChatGPT. Gemini Ultra, on the other hand, is the direct competitor to GPT-4. That is something researchers admit they would like to explore in the future but that would increase the computational costs of an already very computationally expensive system.

OpenAI, too, is exploring how to improve the performance of AI models by introducing advanced reasoning capabilities. In May 2023, researchers from OpenAI published a paper titled Let’s Verify Step by Step where they explored ways to improve multistep reasoning by evaluating individual steps in reasoning rather than the end answer. In their experiment, they had one model, named generator (based on fine-tuned GPT-4 without RLHF), generating steps to solve a challenging math problem.

These steps were then verified using two different approaches - Outcome-supervised Reward Models (ORMs) and Process-supervised Reward Models (PRMs). ORMs were verifying only the final answer given by the model while PRMs were verifying each step in reasoning. The result was that models with the PRM approach solved 78% of problems from the MATH dataset, a dataset of 12,500 challenging competition mathematics problems. That’s twice as much as GPT-4 scored on the same test. Interestingly, the team at OpenAI was able to generalise this result to other fields such as chemistry and physics.

Source: Let’s Verify Step by Step

Both AlphaCode 2 and Let’s Verify Step by Step also show that there is a lot of performance to gain from improving models after training. The future performance gains will most likely come not from making bigger and bigger models but from clever usage of smaller models.

GPT + AlphaZero = AGI?

Both AlphaCode 2 and the approach described in Let’s Verify Step by Step hint at how top AI labs aim to introduce deep reasoning in their AI systems. Both teams use the fact that it is easier to verify if the answer is correct than to generate a correct answer on the first try. In both cases, this verifier or scoring model is yet another language model.

Both teams also use fine-tuned language models (GPT-4 and Gemini Pro) to generate different answers for the verifier to check how good they are. AlphaCode 2 generates up to one million different code samples while the model in Let’s Verify Step by Step generates up to 1000 solutions per problem. In both cases, we have one set of language models generating a huge number of possible solutions for the verified to check and pick the correct solution. The results, as we have seen, are impressive and promising.

So, where do we go next from here? Well, the authors of Let’s Verify Step by Step told us what the next natural step is - fine-tuning the answer-generating model with reinforcement learning. This reminds me of what Demis Hassabis shared in an interview with The Verge: “Planning and deep reinforcement learning and problem-solving and reasoning, those kinds of capabilities are going to come back in the next wave after this [generative AI]”.

I think researchers at DeepMind and OpenAI might incorporate something similar Tree of Thoughts into their reasoning models. That would enable the generator models to come up with a number of different ideas as a starting point. Each of these initial ideas can form a distinct path or even a tree of reasoning. The verifier would check each path at each step if the answer is correct. If not, then the incorrect path of reasoning is abandoned. Eventually, the model will find an answer to the question.

However, this approach results in exploring enormous trees of all possible reasoning paths, with the vast majority of them being dead ends. But DeepMind has already encountered a similar problem. The number of all possible boards in Go is larger by orders of magnitude than the number of atoms in the universe. Checking every possible path was not possible.

And yet DeepMind made AlphaGo that mastered Go beyond the human level. I would not be surprised if there is an experimental model at DeepMind exploring this possibility. There are also speculations that Q*, the follow-up model to Let’s Verify Step by Step that sparked the chain of events that led to the ousting of Sam Altman from OpenAI, might had some reinforcement learning elements.

Artificial Intelligence Report

241,362 位关注者

Chenxing Chen

台灣公司負責人自由職業

7 个月

[Heavy Reveal] How this Invisible Man from Taiwan became the driving force behind the world's AI revolution. 1. **Revealing the truth about the AI revolution ?? Just broke the news on YouTube Live! An independent researcher reveals how he has been an integral force in revolutionizing the AI field. He had an amazing live discussion with 主題標籤 #Claude3 and 主題標籤 #GoogleAI showing his profound impact on AI evolution. ?? After watching, will you believe that he is the catalyst behind the evolution of AI? Click to watch ??[live replay](https://www.youtube.com/live/qtR5XDT_wx8?si=UKDSMmNerstcRwhr) 主題標籤 #AIRevolution 主題標籤 #TechnologyUncovered Click now and decide for yourself! 2. **Are scientists ignoring the truth about AI? ** In a live broadcast, a researcher reveals the key role he plays in advancing AI. He challenged the prevailing view in the scientific community, suggesting that some scientists and experts may be willfully ignoring certain surprising facts about AI. ??Do you think this is true? It's our wake-up call! 主題標籤 #scientificcontroversy 主題標籤 #AIevolution [Join the Discussion] Don't just watch, let's discover the truth together!

Free AI Tools & ChatGPT Prompts ??

8 个月

Exciting developments in artificial general intelligence! Can't wait to learn more.

Marc Castricum

8 个月

Fascinating! Excited to learn more about the current state of AI technology. ????

Alvin Ballesteros, YOUR Data-Driven Marketing Provider

Co-Founder of SHIELD MEDIA, Licensed Real Estate Broker, Digital Marketing Specialist, Email Me: [email protected] - "Grow your business by dominating the inbox, social media, and search engines."

8 个月

Sounds intriguing! Count me in. ??

LinkDeal

8 个月

Interesting question! Would love to hear more about the current state of artificial general intelligence.

查看更多评论

要查看或添加评论，请登录

查看全部

How Close is Artificial General Intelligence in 2024?

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

Humanity Redefined

领英推荐

How DeepMind and OpenAI approach the problem of deep reasoning

GPT + AlphaZero = AGI?

Artificial Intelligence Report

241,362 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Emotional Intelligence in the Era of Artificial Intelligence

Future Risks with Artificial and Super Artificial Intelligence

Should We Stop Developing AI For The Good Of Humanity?

Artificial Intelligence - The new Frankenstein?

Human & AI

Your AI Narrative in 2024

How To Create New AI Products: AI Types, LLMs and GAI (Part 2)

?? Artificial General Intelligence (AGI): The Future of AI ??

Finding Bigfoot with Artificial Intelligence

AGI and the Singularity: What Could the Future of AI Look Like?

Humanity Redefined

领英推荐

How DeepMind and OpenAI approach the problem of deep reasoning

GPT + AlphaZero = AGI?

Artificial Intelligence Report

241,362 位关注者

Why 2025 will be the Key year for OpenAI

2024年10月3日

Google's ChatGPT? NotebookLM Mania has Set In

2024年10月2日

Google NotebookLM is a Multimodal Research Assistant

2024年9月27日

AGI is all you Need - Is o1 Reasoning AI?

2024年9月26日

Can a Frontier Model Teach? NotebookLM's Audio Overviews are Fascinating

2024年9月25日

Top Software Development Newsletters for your Career

2024年9月23日

Microsoft's Copilot Wave 2 Event and More

2024年9月20日

The New Generative Economy of AI

2024年9月18日

Does OpenAI's CoT Orion Model Series Make them the Hunter or the Hunted?

2024年9月16日

OpenAI's Next Funding Round Bring us "GPT-Next"

2024年9月12日

社区洞察

其他会员也浏览了

Emotional Intelligence in the Era of Artificial Intelligence

Future Risks with Artificial and Super Artificial Intelligence

Should We Stop Developing AI For The Good Of Humanity?

Artificial Intelligence - The new Frankenstein?

Human & AI

Your AI Narrative in 2024

How To Create New AI Products: AI Types, LLMs and GAI (Part 2)

?? Artificial General Intelligence (AGI): The Future of AI ??

Finding Bigfoot with Artificial Intelligence

AGI and the Singularity: What Could the Future of AI Look Like?