Strawberry Season Begins ??

Strawberry Season Begins ??

Recently, OpenAI chief Sam Altman hinted in a cryptic post stating that it is working on a project known internally as Project Strawberry, aka Q*. “I love summer in the garden,” said Altman, alongside posting an image of a terracotta pot, containing a strawberry plant with lush green leaves and small, ripening strawberries.?

So, what is Q*??

OpenAI’s Q* algorithm is a significant advancement in AI, merging Q-learning and A* search to enhance human-like reasoning and problem-solving abilities. This combination leads to better goal-oriented thinking and efficient solutions, particularly in complex mathematical problems, even without prior training.

Q-learning, a core concept in reinforcement learning, focuses on identifying the best action in any given state to maximise rewards over time. At its heart is the Q-function, which estimates the expected total reward for taking a specific action in a state and then following the optimal strategy.

RLHF is not really RL?

Self-supervised learning advocate Yann LeCun wholeheartedly agrees. OpenAI co-founder Andrej Karpthy recently expressed disappointment in RLHF, saying, “Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pre training and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely appreciated,” he posted on X.

He explained that Google DeepMind’s AlphaGo was trained using actual reinforcement learning (RL). The computer played games of Go and optimised its strategy based on rollouts that maximised the reward function (winning the game), eventually surpassing the best human players.?

“AlphaGo was not trained with reinforcement learning from human feedback (RLHF). If it had been, it likely would not have performed nearly as well,” said Karpathy.

However, Karpathy agrees that for tasks that are more open-ended, like summarising an article, answering tricky questions, or rewriting code, it's much harder to define a clear goal or reward. In these cases, it’s not easy to tell the AI what a "win" looks like. Since there’s no simple way to evaluate these tasks, using RL in these scenarios is really challenging.

Ergo, Project Strawberry?

Project Strawberry involves a novel approach that allows AI models to plan ahead and navigate the internet autonomously to perform in-depth research. This advancement could address current limitations in reasoning capabilities of AI models, such as common sense problems and logical fallacies, which often lead to inaccurate outputs.

AI Insider who goes by the name ‘Jimmy Apples’ recently revealed that the Q* hasn’t been released yet as they (OpenAI) aren’t happy with the latency and other ‘little things’ they want to further optimise.??

OpenAI’s teams are working on Strawberry to improve the models’ ability to perform long-horizon tasks (LHT), which require planning and executing a series of actions over an extended period.?

The project involves a specialised “post-training” phase, adapting the base models for enhanced performance. This method resembles Stanford’s 2022 “Self-Taught Reasoner” (STaR), which enables AI to iteratively create its own training data to reach higher intelligence levels.

OpenAI recently announced DevDay 2024, a global developer event series scheduled to take place in San Francisco on October 1, London on October 30, and Singapore on November 21.?

While the company has stated that the focus will be on advancements in the API and developer tools, there is speculation that OpenAI might also preview its next frontier model.

OpenAI loves ‘Attention’?

Recently, a new model in the LMsys chatbot arena showed strong performance in maths. Interestingly, before the release of GPT-4o and GPT-4o Mini, these models were also observed in the chatbot arena a few days earlier.

The internal document indicates that Project Strawberry includes a “deep-research” dataset for training and evaluating the models, though the contents of this dataset still remain undisclosed.

This innovation is expected to enable AI to conduct research autonomously, using a “computer-using agent” (CUA) to take actions based on its findings.?

Additionally, OpenAI plans to test Strawberry’s capabilities in performing tasks typically done by software and machine learning engineers.

Last year, it was reported that Jakub Pachocki and Szymon Sidor, two leading OpenAI researchers, used Sutskever’s work to develop a model called Q* (pronounced “Q-Star”) that achieved an important milestone by solving maths problems it had not previously encountered.

Sutskever, raised concerns among some staff that the company didn’t have proper safeguards in place to commercialise such advanced AI models. Notably, he left OpenAI and recently founded his own company called Safe Superintelligence. Following that Pachocki was appointed as the new chief AI scientist.?

Meanwhile, OpenAI recently also released a comprehensive system card for GPT-4o, highlighting a proactive and transparent approach to AI development, which identified and mitigated key risks associated with its multimodal capabilities, including unauthorised voice generation and ungrounded inferences.?

“Our findings indicate that GPT-4o’s voice modality doesn’t meaningfully increase Preparedness risks,” underscoring the importance of continuous safety evaluations in deploying advanced AI models responsibly.?


Synthetic Data Generation in Simulation is Keeping ML for Science Exciting

As Yann LeCun pointed out, “Data generation through simulation is one reason why the whole idea of ML for science is so exciting.”

Simulations allow researchers to generate vast amounts of synthetic data, which can be critical when real-world data is scarce, expensive, or challenging to obtain. For instance, in fields like aerodynamics or robotics, simulations enable the exploration of scenarios that would be impossible to test physically.

Read the full story here.?


AI Bytes?

要查看或添加评论,请登录

AIM Events的更多文章

社区洞察

其他会员也浏览了