登录查看更多内容

Helix: A Vision-Language-Action Model for Generalist Humanoid Control Introducing Helix

Ashish Sonawane

Artificial Intelligence & Data Science | Machine Learning | Deep Learning | NLP | GenerativeAI | LangChain | LLMs | Prompt Engineer

发布日期: 2025年2月21日

Helix, a revolutionary generalist Vision-Language-Action (VLA) model that integrates perception, language understanding, and learned control to address longstanding challenges in robotics. Helix is a breakthrough in the field, bringing multiple firsts to humanoid robotics:

Full-upper-body control: Helix is the first VLA model capable of high-rate continuous control of a humanoid robot's upper body, including wrists, torso, head, and individual fingers.
Multi-robot collaboration: Helix enables two robots to operate simultaneously, solving complex, shared manipulation tasks involving unfamiliar objects.
Pick up anything: Equipped with Helix, Figure robots can grasp and manipulate virtually any small household object based on natural language prompts, even if they have never encountered the item before.
One neural network: Unlike previous approaches requiring task-specific fine-tuning, Helix learns all behaviours—including picking and placing, opening drawers, and interacting with multiple robots—through a single set of neural network weights.
Commercial-ready: Helix runs entirely on embedded low-power-consumption GPUs, making it ready for commercial deployment without additional hardware modifications.

New Scaling for Humanoid Robotics

Household environments present the greatest challenge for robotics due to the diversity and unpredictability of objects. Unlike structured industrial settings, homes contain an array of items—glassware, clothing, toys—varying in shape, size, colour, and texture. To be useful, robots must dynamically generate intelligent behaviours, especially for objects they have never encountered.

Traditional approaches require extensive human intervention: programming a new skill takes hours of expert coding or thousands of demonstrations, making scalability impractical. However, by leveraging AI advancements in vision-language models (VLMs), Helix introduces a paradigm shift—robots can now acquire new skills instantly through natural language commands, eliminating the need for extensive manual programming.

Helix: A "System 1, System 2" VLA Model for Whole Upper Body Control

Helix introduces a dual-system architecture inspired by cognitive science:

System 2 (S2): A VLM operating at 7-9 Hz, responsible for scene understanding and language comprehension, ensuring broad generalisation across objects and contexts.
System 1 (S1): A fast visuomotor policy translating S2’s semantic representations into precise robot actions at 200 Hz.

This decoupled architecture allows S2 to handle high-level reasoning while S1 ensures real-time responsiveness. For instance, S1 rapidly adapts to the partner robot’s movements in collaborative scenarios while maintaining S2’s high-level objectives.

Key Advantages of Helix

Speed and Generalisation: Matches the speed of specialised policies while generalising to thousands of unseen objects.
Scalability: Outputs continuous control for complex humanoid actions without requiring tokenization.
Architectural Simplicity: Uses standard architectures—an open-weight VLM for S2 and a transformer-based visuomotor policy for S1.
Separation of Concerns: Allows independent optimization of S1 and S2, improving flexibility and adaptability.

Model and Training Details

Data Collection

Helix is trained on a dataset of ~500 hours of diverse teleoperated behaviors across multiple robots and operators. Natural language training pairs are generated using an auto-labeling VLM, which analyzes segmented video clips and formulates instructional prompts based on observed actions.

Architecture

System 2 (S2): A 7B-parameter open-source VLM processes monocular robot images and state information, translating vision-language embeddings into a single latent vector.
System 1 (S1): An 80M parameter transformer conditions its low-level control policy on S2’s latent vector, ensuring precise, high-frequency motor execution.

Training Strategy

Helix is trained end-to-end, mapping raw pixels and text commands to continuous actions via regression loss. A temporal offset is introduced during training to match real-time inference latency, ensuring smooth deployment.

领英推荐

An OpenAI spinoff has built an AI model that helps…

MIT Technology Review 1 年前

What Are The Top 6 Career-Defining Tech Trends In 2020…

Bernard Marr 5 年前

?? The World's First AI CEO Enters the Boardroom

Hanna Larsson 1 年前

Optimized Streaming Inference

Helix is deployed on low-power GPUs, with S2 and S1 operating asynchronously. S2 continuously updates a shared latent vector encoding high-level behavioural intent, while S1 processes real-time robot observations for precise motor control. This structure ensures Helix maintains the necessary 200 Hz control loop, making it as fast as traditional single-task imitation learning policies.

Results

Fine-grained whole Upper Body Control

Helix enables smooth coordination across 35 degrees of freedom (DoF), including individual finger control, head tracking, and torso adjustments. The robot dynamically modifies its posture for optimal reach while maintaining precise grasping, a significant achievement in humanoid robotics.

Zero-Shot Multi-Robot Coordination

Helix successfully enables two-figure robots to collaborate in real-time on complex tasks, such as grocery storage and handling completely novel objects. Robots communicate through natural language prompts like "Hand the bag of cookies to the robot on your right," showcasing emergent multi-agent behaviour without explicit role assignments.

Emergent “Pick Up Anything” Capability

Helix-equipped robots can pick up any small household object via simple prompts like “Pick up the toy” or “Pick up the dessert item.” The model translates abstract concepts into precise actions, demonstrating advanced generalisation across diverse environments.

Discussion and Future Prospects

Helix represents a major leap in humanoid robotics, proving that vision-language knowledge can directly translate into real-time motor control. With its efficient training, commercial viability, and generalisation capabilities, Helix sets a new standard for AI-driven robotic systems. Our next steps include refining multi-robot collaboration, expanding object interaction capabilities, and integrating Helix into real-world applications, bringing us closer to truly autonomous humanoid assistants.

https://www.figure.ai/news/helix

Ai Data Science Insider

193 位关注者

要查看或添加评论，请登录

Ashish Sonawane的更多文章

The Quantum Revolution: National Security and the Future of Communication

2024年9月30日

The Quantum Revolution: National Security and the Future of Communication

Quantum computing is rapidly transitioning from a theoretical concept to a practical reality. This emerging technology…
Navigating the AI-Driven Future

2024年1月23日

Navigating the AI-Driven Future

The horizon of artificial intelligence (AI) is continually expanding, shaping a future that intertwines with various…
The Evolution of Technology: From 1D to 2D to 3D

2024年1月21日

The Evolution of Technology: From 1D to 2D to 3D

In the ever-accelerating realm of technology, we are embarking on a captivating journey of evolution, traversing from…
Langchain: A Framework for Leveraging Large Language Models

2024年1月19日

Langchain: A Framework for Leveraging Large Language Models

Introduction: Langchain, an open-source framework, offers developers the means to harness the capabilities of large…
The AI Behemoth! NVIDIA Breaks Records with Soaring Market Cap Over $1.29 Trillion ??

2024年1月10日

The AI Behemoth! NVIDIA Breaks Records with Soaring Market Cap Over $1.29 Trillion ??

In a remarkable turn of events, NVIDIA, the pioneer in graphics processing units (GPUs), has reached an astounding…

See all articles

Helix: A Vision-Language-Action Model for Generalist Humanoid Control Introducing Helix

Ashish Sonawane

Artificial Intelligence & Data Science | Machine Learning | Deep Learning | NLP | GenerativeAI | LangChain | LLMs | Prompt Engineer

New Scaling for Humanoid Robotics

Helix: A "System 1, System 2" VLA Model for Whole Upper Body Control

Key Advantages of Helix

Model and Training Details

Data Collection

Architecture

Training Strategy

领英推荐

Optimized Streaming Inference

Results

Fine-grained whole Upper Body Control

Zero-Shot Multi-Robot Coordination

Emergent “Pick Up Anything” Capability

Discussion and Future Prospects

Ai Data Science Insider

193 位关注者

Ashish Sonawane的更多文章

社区洞察

其他会员也浏览了

The Growing Role of Data Science in Robotics

Voxel51 Filtered Views Newsletter – July 12, 2024

AI-Driven Dexterous Robotic Manipulation: Advancements in Adaptive Grasping, Compliance Control, and Multi-Modal Learning

Abundance Insider: October 23rd, 2019

The rise of humanoids and their impact on technology and industries

Is Artificial Intelligence Changing Our Society?

Agibot’s Decepticons vs. Elon’s Optimus

Embodied Intelligence: Bridging the Gap Between Digital and Physical Worlds

Agentic Systems - from Roomba to Mars

Unlocking the Future: Mastering End-to-End Management of Complex Tech Products with AI, Machine Vision, and Deep Learning.

New Scaling for Humanoid Robotics

Helix: A "System 1, System 2" VLA Model for Whole Upper Body Control

Key Advantages of Helix

Model and Training Details

Data Collection

Architecture

Training Strategy

领英推荐

Optimized Streaming Inference

Results

Fine-grained whole Upper Body Control

Zero-Shot Multi-Robot Coordination

Emergent “Pick Up Anything” Capability

Discussion and Future Prospects

Ai Data Science Insider

193 位关注者

Ashish Sonawane的更多文章

The Quantum Revolution: National Security and the Future of Communication

Navigating the AI-Driven Future

The Evolution of Technology: From 1D to 2D to 3D

Langchain: A Framework for Leveraging Large Language Models

The AI Behemoth! NVIDIA Breaks Records with Soaring Market Cap Over $1.29 Trillion ??

社区洞察

其他会员也浏览了

The Growing Role of Data Science in Robotics

Voxel51 Filtered Views Newsletter – July 12, 2024

AI-Driven Dexterous Robotic Manipulation: Advancements in Adaptive Grasping, Compliance Control, and Multi-Modal Learning

Abundance Insider: October 23rd, 2019

The rise of humanoids and their impact on technology and industries

Is Artificial Intelligence Changing Our Society?

Agibot’s Decepticons vs. Elon’s Optimus

Embodied Intelligence: Bridging the Gap Between Digital and Physical Worlds

Agentic Systems - from Roomba to Mars

Unlocking the Future: Mastering End-to-End Management of Complex Tech Products with AI, Machine Vision, and Deep Learning.