Nvidia Brings GenAI to the Physical World with Cosmos

Nvidia Brings GenAI to the Physical World with Cosmos

In what was undoubtedly the most eagerly anticipated, most closely watched and highly attended CES keynotes of all time, Nvidia CEO Jensen Huang managed to once again unveil an impressively wide-ranging set of announcements across many of the hottest topics in tech, including AI, robotics, autonomous cars and more.

Clad in a Las Vegas-glitz style version of his trademark black leather jacket, the tech industry leader worked his way through the company’s latest Geforce RTX50 series graphics cards, new Nemotron AI foundation model families and AI blueprints for AI-powered agents. He also touted extensions to the company’s Omniverse digital twin/simulation platform that extends AI into the physical world, new safety certifications for its autonomous driving platform, and a new mini desktop size AI supercomputer called Project Digits that’s powered by a Grace Blackwell GPU. Needless to say, it was all a lot to take in.

One of the most intriguing—but probably least understood—of all the announcements was a set of foundation models and platform capabilities that the company is calling Cosmos. Specifically defined as a set of world foundation models, advance tokenizers, safety guardrails and an advanced video processing pipeline, Cosmos is designed to bring the training capabilities and advanced outcomes of generative AI from the digital world into the physical one. In other words, instead of having GenAI create new digital outputs built from its training across billions of documents, images, and other digital content, Cosmos can help generate new physical actions—let’s call them analog outputs—by leveraging data it’s been trained on from digitally simulated environments.

While the concept is complex, the real-world results are both simple and profound. For applications like robotics, autonomous vehicles, and other mechanical systems, this means that Cosmos can help these systems react to physical stimuli in more accurate, safe, and helpful ways. For example, humanoid-style robots can be taught to physically emulate the most effective or safest way to perform a given task—whether it’s flipping an omelet or picking up and putting away a part on a production line. Similarly, an autonomous car can be taught to react dynamically to different types of situations and environments.

Much of this type of training is currently going on, but a huge portion of it is being done manually, with human beings being filmed performing the same action hundreds of different times or having autonomous cars drive millions of miles. Plus, even after that’s done, thousands of people are spending enormous amounts of time hand-labelling and tagging those videos. With Cosmos, these types of training methods can be automated, dramatically reducing costs, saving time, and improving the range of data that’s used for the training process.

The way it works is that Cosmos acts as a type of extension to Nvidia’s Omniverse digital simulation environment and takes the digital physics of the models and systems that are created in Omniverse and translates them into physical actions in the real world. While that may seem like a subtle distinction, it’s a critically important one, because it’s what allows Cosmos to generate its GenAI-powered physical outputs. At the heart of Cosmos is a series of what are called world foundation models, built from millions of hours of video content, that have an understanding of the physical world. Cosmos essentially takes the digital models of physical objects and environments that can be created in Omniverse and then places them into these world foundation models and generates photorealistic video outputs of how these models are predicted to react in the world. These videos, in turn, serve as synthetic data sources that can be used to train the models running in robotic systems, autonomous cars and other GPU-powered mechanical systems. The end results are systems that can react more effectively across a wide range of different environments.

One other important note is that Nvidia is making its Cosmos world foundation models available for free to encourage more developments in the fields of robotics and autonomous vehicles as well as further experimentation.

In the short term, the immediate impact of Cosmos will be limited, as it’s primarily targeted at a small group of individuals who are developing advanced robotics and autonomous vehicle applications. Longer term, however, the impact could be profound as it’s expected to dramatically speed up the development of these product categories and improve the accuracy and safety of these applications. More importantly, it shows how Nvidia continues to look ahead to and plan for bigger tech trends like robotics. It also highlights the ongoing but little recognized trend that Nvidia is transforming itself into a software company building platforms for these new applications. For those wondering where the company is headed and how it should be able to continue its impressive growth, these are intriguing and important signs.

Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on LinkedIn at Bob O’Donnell or on Twitter @bobodtech.

Richard Self

Leadership and Keynote Speaker and member of the Data Science Research Centre at University of Derby

1 个月

See also this Nvidia page on Cosmos. https://developer.nvidia.com/blog/advancing-physical-ai-with-nvidia-cosmos-world-foundation-model-platform/ It is clearly a slightly better and controllable version of systems like Sora. Trained on 20M hours of videos (I wonder where from?) the assumption is that the various neural layers will develop world physics models. Note also that the metrics are of still images for 3Dness not motion.

Richard Self

Leadership and Keynote Speaker and member of the Data Science Research Centre at University of Derby

1 个月

The critical issue is whether the GenAI systems and video analysis systems are expected to develop physics and engineering models from the videos or proper, traditional physics and engineering models are built in. If the former, we will need to test the output of the simulations extremely carefully. So far, using the video based learning doesn't seem to be very successful.

Spot on. The keynote was filled with great announcements, but Cosmos stole the show for me also. Physical AI has such a huge opportunity for society and our world. This is a critical building block to unlock it. Excited for the road ahead!

要查看或添加评论,请登录

Bob O'Donnell的更多文章

社区洞察

其他会员也浏览了