The OpenAI o1 Gamble

The OpenAI o1 Gamble

OpenAI is betting big on inference time scaling to crack the code for AGI. Can they pull it off with o1? Plenty of critics and experts think otherwise, dismissing it as doomed to fail . Some seem almost eager to watch Sam Altman & Co. stumble and fall.

Have LLMs Hit the Wall?

According to an article in The Information, OpenAI's progress from GPT-4 to o1 has reportedly slowed down. Although o1 has completed only 20% of its training, it’s already matching GPT-4 in intelligence, task fulfilment, and question-answering abilities.?

However, the improvement isn’t as dramatic as the leap from GPT-3 to GPT-4. This has led many to wonder: Have LLM improvements hit a dead-end?

Critics Rejoice Too Soon: No one seemed more thrilled about this potential plateau than AI critic Gary Marcus , who promptly posted on X, “Folks, game over. I won. GPT is hitting a period of diminishing returns, just like I said it would.”?

However, Uncle Gary may have celebrated a bit too early. One of the article’s authors quickly responded, “With all due respect, the article introduces a new AI scaling law that could replace the old one. The sky isn’t falling.”

Similarly, OpenAI researchers were quick to correct the narrative, asserting that the article inaccurately portrays the progress of their upcoming models.        

Introducing Inference Time Scaling: “There are now two key dimensions of scaling for models like the o1 series—training time and inference time," said Adam Goldberg , a founding member of OpenAI’s go-to-market team. While traditional scaling laws focus on pre-training larger models for longer, there’s now another important factor at play.

“Aspect of scale remains foundational. However, the introduction of this second scaling dimension is set to unlock amazing new capabilities,” he added.

A New Way of “Thinking”

OpenAI researcher Noam Brown elaborated that o1 is trained with reinforcement learning (RL) to “think” before responding via a private chain of thought. “The longer it thinks, the better it performs on reasoning tasks,” he explained. This introduces a new dimension to scaling. “We’re no longer bottlenecked by pre-training. We can now scale inference compute as well.”

Another researcher, Jason Wei, explained the difference in the chain of thought before and after o1. Traditional chain-of-thought reasoning used by AI models like GPT was more mimicry than true thinking. The model would often reproduce reasoning paths it encountered during pre-training.

With o1, the system introduces a more robust and authentic thinking process. Instead of simply spitting out an answer, the model engages in an “inner monologue” or “stream of consciousness,” actively considering and evaluating options. “You can see the model backtracking; it says things like ‘alternatively, let’s try’ or ‘wait, but’,” he added.

The Power of Test-Time Compute: Peter Welinder , the VP of product at OpenAI, emphasised the underestimated power of test-time compute. “Compute for longer, in parallel, or fork and branch arbitrarily—like cloning your mind 1,000 times and picking the best thoughts,” he said.

Earlier, when OpenAI released o1-mini and o1-preview, it mentioned that o1's performance consistently improves with more reinforcement learning (train-time compute) and more time spent thinking (test-time compute).?        

Regarding inference time scaling, the company said, “The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”

Coincidently, Jensen Huang Steps In

When NVIDIA CEO Jensen Huang recently said, “We are going to take everybody with us, ” he truly meant it. In a recent podcast with No Priors, Huang shared that one of the major challenges that NVIDIA is currently facing is inference time scaling, which involves generating tokens at incredibly low latency.

He explained that, in the future, AI systems will need to perform tasks like tree search, chain of thought, and mental simulations, reflecting on their own answers—all while responding in real-time. This approach subtly points to the capabilities of the o1 system.

AGI is Coming in 2025?

While others remain uncertain, CEO Altman is confident that AGI is closer than many think. In a recent interview with Y Combinator’s Garry Tan , Altman suggested that AGI could emerge as soon as 2025. “I think we are going to get there faster than people expect,” he said, underscoring OpenAI's accelerated progress.

He acknowledged that OpenAI had fewer resources than competitors like DeepMind. “So we said, ‘Okay, they are going to try a lot of things and we have just got to pick one and really concentrate’,” he added.

Enjoy the full story here .?


A New Era in Protein Prediction Begins?

Google DeepMind recently open sourced the AlphaFold 3 model, making its training weights accessible to academic researchers and scientists—for non-commercial use only. Check out the model code here .

“We are excited to see how the research community continues to use AlphaFold to address open questions in biology and new lines of research,” said Google DeepMind’s Pushmeet Kohli , as promised six months ago with the announcement to expand AlphaFold 3’s accessibility for the scientific community. ?

Know more about AlphaFold 3 here .?

AI Bytes >>?

  • Elon Musk’s Starlink is gearing up to enter India , challenging Jio’s dominance in broadband by complying with India’s data localisation and security requirements. With this, it prepares to compete in an emerging satellite internet market alongside rivals like Amazon’s Project Kuiper and Rivada Networks.?
  • Alibaba’s Qwen2.5-Coder-32B-Instruct model sets a new benchmark in open source coding models with advanced code generation, repair, and multilingual support. It challenges closed source models like GPT-4o while aligning closely with human coding preferences.
  • Indian startup Rabbitt AI is looking to transform military operations with GenAI-powered drones, autonomous vehicles, and real-time surveillance systems, reducing human exposure to high-risk zones across India, MENA, and Europe.
  • Salesforce recently released Moirai-MoE , a time series model, designed to improve forecasting accuracy with sparse transformers, outperforming previous models like Moirai-Large while using significantly fewer parameters.
  • Ollama recently announced support for Llama 3.2 Vision , allowing users to run the multimodal model locally in 11B and 90B sizes for tasks like OCR, image recognition, and data visual analysis, enhancing privacy by avoiding cloud processing, despite recent vulnerability disclosures.?

要查看或添加评论,请登录