How Google’s Robots Can Learn from the Web Using AI ??
Google is one of the leading companies in the field of artificial intelligence (AI), developing cutting-edge technologies that can perform various tasks, from understanding natural language to playing complex games. But can AI also help robots learn from the web?
That is the question that Google researchers are trying to answer with their new approach to using large language models (LLMs), which are AI systems that can generate natural language based on massive amounts of text data.?The researchers have shown how LLMs can enable robots to write and execute their own code in Python, one of the most popular programming languages, based on instructions from humans.
The new approach builds on Google’s previous work on PaLM-SayCan, a model that allows robots to understand open-ended prompts from humans and respond reasonably and safely in a physical space. For example, if a human asks a robot to “pick up the red ball and put it in the blue box”, the robot can use PaLM-SayCan to parse the request, plan the actions, and execute them.
However, PaLM-SayCan has some limitations. It can only handle simple commands that involve predefined actions and objects. It cannot deal with complex scenarios that require logic, reasoning, or creativity. It also cannot learn from its own experience or improve its performance over time.
To overcome these challenges, the researchers have integrated PaLM-SayCan with another LLM called Codex, which was developed by OpenAI and can generate Python code based on natural language queries. By combining these two models, the researchers have created a system that can translate human instructions into Python code, and then execute the code using a robot.
The system works as follows: First, the human provides a high-level description of what they want the robot to do, such as “sort the balls by colour”. Then, the system uses Codex to generate a Python script that implements the task. Next, the system uses PaLM-SayCan to execute the Python script using a robot arm. The robot arm can interact with the environment and manipulate objects such as balls and boxes. The system also monitors the robot’s actions and provides feedback and corrections if needed.
领英推荐
The researchers have tested their system on various tasks, such as sorting objects by shape or size, stacking blocks in a specific order, or drawing shapes on a paper. They have found that their system can generate accurate and efficient code for most of these tasks, and that the robot can execute them successfully.
The researchers claim that their system is the first of its kind to enable robots to learn from the web using LLMs. They believe that this approach can open up new possibilities for robotic applications, such as education, entertainment, or assistance. They also hope that their system can inspire more research on how LLMs can be used for other domains and tasks.
One of the key features of their system is that it can “see” the world around it, “understand” the task, and instruct the robot what to do.?This is possible because their system uses a vision-language-action (VLA) model called RT-2, which is based on Transformers4. Transformers are neural network architectures that can process different types of data, such as text, images, or audio. RT-2 is trained on text and images from the web, which allows it to learn general concepts and skills that can be applied to different situations.?For example, RT-2 can recognize trash and know how to dispose of it, even if it has never seen those objects before5.
RT-2 is also able to communicate with humans using natural language. It can understand queries and commands from humans, and generate responses or actions accordingly. It can also ask questions or provide feedback to humans if needed. For example, if a human asks RT-2 to “pick up the extinct animal”, RT-2 can locate and pick out a dinosaur figurine from a table. If RT-2 is unsure about something, it can ask for clarification or confirmation from humans.
RT-2 is not only a powerful tool for robot learning, but also a potential companion for humans. It can perform useful tasks for humans, such as cleaning or organizing. It can also entertain humans with games or jokes. It can even learn from humans and improve its skills over time.