OpenAI has o1

OpenAI has o1

Today, OpenAI introduced its new o1-preview series of AI models, which can solve complex problems in areas such as science, coding, and maths. The models are now available in ChatGPT and the API, as part of an early preview, with regular updates and improvements expected.

“Extremely proud of the team; this was a monumental effort across the entire company. Hope you enjoy it!” posted OpenAI chief Sam Altman on X. He even wrapped up his inside joke with AI insider Jimmy Apples by saying , “No more patience, Jimmy,” to which Apples replied, “It feels good, Sam. Really good.”

“We trained a model and it is good in some things,” said OpenAI’s Jerry Tworek . To this, Altman said, “But would you rather be a little sad most of the time and super happy occasionally, or a little happy all of the time and very sad once in a while?,” subtly hinting at OpenAI’s o1 reasoning capabilities to solve complex human emotions. ?

The o1 series models are trained to spend more time thinking before responding, refining their reasoning process and improving problem-solving capabilities. In initial tests, the next update of the reasoning model performed on par with PhD students on physics, chemistry, and biology tasks, achieving notable success in maths and coding competitions. In a qualifying exam for the International Mathematics Olympiad, the model scored 83%, compared to GPT-4o’s 13%.

Despite its advanced reasoning abilities, the o1-preview model lacks some of the practical features found in GPT-4o, such as web browsing and file uploading. However, OpenAI emphasises the model’s potential for tackling complex tasks, particularly in fields requiring multi-step workflows.

As part of the release, OpenAI has implemented a new safety training approach that allows the models to follow safety rules better. In jailbreaking tests, o1-preview outperformed GPT-4o, scoring 84 out of 100, compared to GPT-4o’s 22. OpenAI has also bolstered its safety efforts by partnering with AI safety institutes in the US and UK.

Alongside o1-preview, OpenAI has released a smaller, cost-effective model called o1-mini, designed specifically for developers who need advanced coding capabilities without broad world knowledge. o1-mini is 80% cheaper than o1-preview.

Starting today, ChatGPT Plus and Team users can manually select o1-preview and o1-mini from the model picker, with rate limits of 30 messages for o1-preview and 50 for o1-mini. API users in the highest usage tier can also begin prototyping, although some features like function calling and streaming are not available yet.

OpenAI plans to expand access to o1-mini for ChatGPT free users and will continue adding new features to the o1 series, including browsing and file uploads.        

NVIDIA’s Jim Fan lauded OpenAI o1 for its focus on inference-time scaling rather than model size. He emphasised that large models are not necessary for reasoning, as reasoning can be separated from knowledge using a “small reasoning core” and tools like code verifiers.

“You don’t need a huge model to perform reasoning... a small ‘reasoning core’ that knows how to call tools like browser and code verifier can factor out reasoning from knowledge,” he added.?

Devin’s creator, Cognition Labs , worked closely with OpenAI over the past few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. They found that the new models represented a significant improvement for agentic systems that dealt with code.

A few days earlier, in a cryptic post, Altman had hinted that the company was working on a project internally known as Project Strawberry , also referred to as Q*.?

“I love summer in the garden,” wrote Altman on X, posting the image of a terracotta pot containing a strawberry plant with lush green leaves and small, ripening strawberries.

Project Strawberry was said to significantly enhance the reasoning capabilities of OpenAI’s AI models. It is pretty clear that the o1-preview is exclusively Strawberry.?

Meanwhile, OpenAI is in talks to raise up to $7 billion , potentially valuing the company at $150 billion, with investment interest from UAE’s MGX, Microsoft, NVIDIA, and Apple.?

Can o1 Save GitHub Copilot?

Ever since Cursor and Claude hit the market, developers have been slowly moving away from GitHub Copilot . According to sources, Microsoft has plans to upgrade its capabilities on the VS Code IDE, which would help it compete with Cursor. But what about GitHub Copilot?

GitHub CEO Thomas Dohmke is optimistic. He posted on X a video of GitHub Copilot in VS Code running with OpenAI’s o1 model, which he calls “flat out badass”. The new model has been integrated into GitHub Copilot and is making AI pair programming a lot smarter.        

Meanwhile, developers have also started implementing o1 within Cursor Composer and have already started creating apps. Cursor being a fork of VS Code, enables much more flexibility when it comes to integrating LLMs within it, making it ideal for several developers.?

The competition now seems to be head-on between Cursor and GitHub Copilot as both can now run on o1, which according to developers, is currently performing better than Claude in certain use cases. Enjoy the full story here .


AMD Tries to Break NVIDIA’s CUDA Ecosystem with UDNA

AMD has announced a significant shift in its GPU architecture strategy with the introduction of UDNA (Unified Data and Neural Architecture). This new architecture aims to merge AMD’s existing RDNA (for gaming) and CDNA (for data centres) architectures into a single, unified platform.

However, users allege that AMD has been partial in providing support, and is more inclined to providing better support to CDNA. RDNA requires per-generation optimisation. Due to this reason, AMD has to put a lot more effort into RDNA users. Read on.?


AI Bytes?

  • Google has introduced DataGemma , a new open model that integrates LLMs with real-world data from its Data Commons repository, using retrieval-augmented methods like RIG and RAG to reduce AI hallucinations and improve the accuracy of generative AI outputs in research and decision-making contexts.
  • Baidu has rebranded its ERNIE Bot as Wenxiaoyan , bringing advanced AI-driven search capabilities into its chatbot, allowing users to search for music, maps, articles, and more, while integrating features like personalised content scheduling, multimedia search, and expert advice, making it a popular choice among young users with over ten million monthly active users.
  • AWS has selected seven Indian startups—Converse, House of Models, Neural Garage, Orbo.ai , Phot.ai , Unscript AI, and Zocket—for its Global Generative AI Accelerator program , offering up to $1 million in credits, mentorship, and technical support to scale their AI innovations.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了