Google Unveils RT-2, an AI Powered Robot That Teaches Itself
https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action

Google Unveils RT-2, an AI Powered Robot That Teaches Itself

The team over at Google DeepMind has once again left the tech world in awe with the unveiling of their latest AI model, the Robotics Transformer 2 (RT-2).

Unlike ChatGPT this is not a model that simply spits out text or images. Instead, it uses artificial intelligence to learn new skills from text and pictures it finds on the internet. This allows the robot to carry out a wide variety of tasks in the real world. For instance, it can not only recognize an apple but also understands the context - what makes an apple different from a red ball, and how to pick up and handle it.

"RT-2 shows that vision-language models (VLMs) can be transformed into powerful vision-language-action (VLA) models, which can directly control a robot by combining VLM pre-training with robotic data."


No alt text provided for this image


Web Knowledge Transferred to Robotic Control

The team at DeepMind developed RT-2 as a means to import web knowledge into robotic control. While chatbots can perform complex tasks within the digital world, transferring this into real-world tasks has always been a challenge. Robots must understand and operate within unpredictable environments, executing complex, abstract tasks – a demand that required billions of data points regarding the physical world.

The RT-2 revolutionizes this approach by leveraging the capabilities of its predecessor, RT-1, to generalize information across systems. Google claims that RT-2 can now perform complex reasoning tasks with significantly less robot training data.

It can transfer knowledge from a vast corpus of web data and manage intricate human-made requests, such as disposing of a "piece of trash." It understands the concept of "trash" and knows how to dispose of it, even without specific programming for that action.

No alt text provided for this image
https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action


Surpassing Previous Benchmarks in Robotic Trials

DeepMind engineers conducted over 6,000 "robotic trials" of the RT-2 model to test its capabilities. The results were stunning. In tasks based on the training data, the RT-2 models performed on par with the RT-1 models. But in new, unknown scenarios, RT-2 outperformed its predecessor, doubling the completion rate from 32% to a commendable 62%. This adaptability in unfamiliar situations substantially pushes the boundaries of AI capabilities.

No alt text provided for this image
https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action


Google asserts that RT-2 exemplifies how generative AI and large language models (LLM) are rapidly advancing the robotics sector. There is still much work ahead, but the DeepMind team is optimistic about the path they are paving.

Understanding the Workings of RT-2

The brilliance of RT-2 lies in its architecture. DeepMind adapted Pathways Language and Image model (PaLI-X) and Pathways Language model Embodied (PaLM-E) to act as the backbones of RT-2. To control a robot, it must output actions. The engineers represented these actions as tokens in the model’s output – similar to language tokens – and converted it into a string representation to train VLM models on robotic data.

Balancing Progress and Concerns

The emergence of AI like RT-2 isn't without concerns. Critics argue that advancements like this risk tipping the delicate balance between control and autonomy, leading to unpredictable consequences. They also point out the potential for robots learning from the internet to inadvertently adopt dangerous ideologies or act on manipulated information.

Concerns also include the "black box" problem of understanding how decisions are made by increasingly sophisticated AI, and the potential for job displacement as robots become more capable.

No alt text provided for this image


However, these advancements also present enormous benefits. As the landscape continues to evolve, fostering an ongoing dialogue among scientists, policymakers, and the public about the future of AI and robotics is essential. It will help establish guidelines and regulations that promote beneficial uses while minimizing potential risks.

The unveiling of RT-2 marks a milestone in the journey towards building a general-purpose physical robot that can reason, problem solve, and interpret information for performing a diverse range of tasks in the real world. It's an exciting time for AI and robotics, and RT-2 is sure to be at the forefront of that adventure.

This is awesome. It's also a bit of a "Well that's that, I suppose - come on in, robots!"

回复

要查看或添加评论,请登录

Jaeden Schafer的更多文章

社区洞察

其他会员也浏览了