FOD#42: Is 2024 the year of advanced robotics?

TuringPost

Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??

发布日期: 2024年2月27日

This post was originally published on our website. Join over 40,000 readers for in-depth knowledge and forward-thinking analysis, to make smarter decisions about AI & ML. Subscribe now for free.

Are we on the brink of debunking Moravec’s Paradox? In the 1980s, AI and robotics researcher Hans Moravec highlighted a counterintuitive aspect of AI: tasks requiring high-level reasoning — like chess or Go — are easier for AI to master than basic sensory and motor skills — such as walking or identifying your mom’s face — which humans find instinctive. Adding complexity, these “simpler” skills actually demand much more computational power. This insight sheds light on the complexity of replicating human-like perception and dexterity, outcomes of millions of years of evolution, as opposed to logical reasoning, a more recent development. In today’s AI and ML landscape, this paradox underscores the challenges in creating robots and AI systems capable of seamlessly navigating and interacting with the physical world.

However, last week, Bernt Bornich, CEO and founder of 1x, a humanoid robotics company, wrote, “New progress update on the droids dropping in 4 weeks, looks like Moravec’s paradox might be debunked, and we just didn’t have the data.” I suspect that this has something to do with the advancements in foundation models. Originally known for their ability to perform a wide range of tasks based on a single type of data (like text for language models), these models become “multimodal” when integrating and interpreting information across different sensory inputs, closely mirroring human-like understanding.

Could the embodiment of AI, with all its sensory inputs, plus reasoning-imitation algorithms like LLMs, be the pool of data that disproves Moravec’s paradox?

Another intriguing development caught my attention. Huang Jensen, Nvidia’s CEO, responded to a question from Wired about what current development could change everything. Jensen replied, “There are a couple of things. One doesn’t really have a name, but it’s part of the work we’re doing in foundational robotics. If you can generate text and images, can you also generate motion? The answer is probably yes. And if you can generate motion, you can understand the intent and generate a generalized version of articulation. Therefore, humanoid robotics should be right around the corner.”

Something to observe in the coming weeks!

In related news, from the robotics universe, Figure AI, a humanoid robotics startup, made headlines by raising approximately $675 million in funding. What’s more impressive is the list of backers: Amazon, NVIDIA, Microsoft, OpenAI, Intel, LG, and Samsung. This indicates a strong belief in the potential of humanoid robotics to disrupt various sectors.

Yet, there are skeptical voices. Rodney Brooks, who coined Nouvelle AI*, posted last week: “Tele-op robots presented as autonomous, like the Tesla Optimus humanoid folding a shirt, and 1X humanoid robots, are misrepresentations of what robots are actually doing, which can also be called LIES. Note that the Stanford robot cooking and cleaning videos are also tele-operated.”

If 2023 was the year of LLMs, are we ready to evolve to an embodied AI and make 2024 the year of robots?

?? Bonus: The freshest research papers from the week of Feb 19 — Feb 25

Michael Barrett 4 个月前

This Week in Robotics 11.11

Jack Pearson 1 年前

Six-legged, surprisingly social, and under the sea: my…

Martina Marek 3 年前

Enhancing Large Language Models (LLMs)

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens: Expands the processing capability of LLMs to handle over 2 million tokens, pushing the boundaries of context window sizes for more comprehensive understanding and generation tasks. Read the paper.
OmniPred: Language Models as Universal Regressors: Demonstrates the versatility of LLMs in performing numerical regression tasks, suggesting their potential as universal tools for predictive modeling across a variety of domains. Read the paper.
Divide-or-Conquer? Which Part Should You Distill Your LLM?: Investigates efficient strategies for distilling large models into smaller, more manageable ones, particularly for reasoning tasks, emphasizing the importance of decomposition over problem-solving. Read the paper.
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition: Proposes a method to enhance the efficiency of self-attention mechanisms in LLMs, crucial for improving performance and reducing resource consumption. Read the paper.
USER-LLM: Efficient LLM Contextualization with User Embeddings: Introduces a framework for personalizing LLM interactions using user embeddings, enhancing the model’s responsiveness to individual user preferences and histories.

Multimodal and Multi-Agent Systems

World Model on Million-Length Video and Language with RingAttention: Explores integrating video and language for advanced AI understanding and interaction, leveraging a novel RingAttention mechanism for efficient multimodal learning. Read the paper.
AgentScope: A Flexible yet Robust Multi-Agent Platform: Develops a multi-agent platform that enhances cooperation and flexibility among agents, addressing the complexity of multi-agent systems and their practical applications. Read the paper.
TinyLLaVA: A Framework of Small-scale Large Multimodal Models: Focuses on the design and analysis of small-scale multimodal models, proving that with strategic optimizations, smaller models can achieve or surpass the performance of larger counterparts. Read the paper.
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling: Introduces a versatile multimodal language model that processes speech, text, images, and music, demonstrating the power of discrete representations in unifying various data modalities within a single framework. This approach simplifies the integration of new modalities without needing to modify the underlying architecture or training methodologies. read the paper
A Touch, Vision, and Language Dataset for Multimodal Alignment: Presents a novel dataset that enhances multimodal understanding by incorporating touch with vision and language, aiming to advance touch-vision-language alignment and understanding through a tactile encoder and text generation model. read the paper

Advancements in Specific Domains

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information: Introduces a novel object detection model that leverages programmable gradient information for enhanced accuracy and efficiency in learning. Read the paper.
Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*: Demonstrates the application of Transformers in complex planning tasks, offering a method that surpasses traditional search algorithms in efficiency and effectiveness. Read the paper.

Developer Tools and APIs

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs: Presents a vast dataset designed for training LLMs to interact with APIs, addressing the challenge of creating effective models for API usage and integration. Read the paper.
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement: Develops an open-source system for code generation, execution, and refinement, facilitated by a dataset of multi-turn interactions, aiming to bridge the gap between code generation models and practical coding tasks. Read the paper.

Security and Adversarial Research

Coercing LLMs to do and reveal (almost) anything: Explores the susceptibility of LLMs to a wide range of adversarial attacks, highlighting the need for comprehensive security measures to protect against unintended behaviors and data extraction. Read the paper.

Model Efficiency and Quantization

OneBit: Towards Extremely Low-bit Large Language Models: Discusses a novel framework for quantizing LLM weight matrices to 1-bit to drastically reduce storage and computational demands while maintaining performance, enabling efficient deployment of LLMs on resource-constrained devices. read the paper

Instruction Tuning and Data Quality

Reformatted Alignment: Introduces REALIGN, a method for refining instruction data quality for LLMs to better align with human values, emphasizing the importance of instruction data quality in model alignment and suggesting areas for further exploration in LLM science. read the paper
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models: Proposes a novel method for instruction tuning that generates synthetic instruction data across all disciplines, showcasing a scalable and customizable approach to instruction tuning without relying on specific training data. read the paper
Instruction-tuned Language Models are Better Knowledge Learners: Proposes a pre-instruction-tuning method to enhance LLMs’ knowledge updating capabilities, demonstrating significant improvements in factual knowledge absorption and cross-domain generalization. read the paper

FOD#42: Is 2024 the year of advanced robotics?

TuringPost

Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??

?? Bonus: The freshest research papers from the week of Feb 19 — Feb 25

领英推荐

Enhancing Large Language Models (LLMs)

Multimodal and Multi-Agent Systems

Advancements in Specific Domains

Developer Tools and APIs

Security and Adversarial Research

Model Efficiency and Quantization

Instruction Tuning and Data Quality

Turing Post

2,111 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Weekly Robotics #242

2024: Year of Robots Transforming the World

AI Horizons: The Future of Humanoid Robotics with Figure 02

Transforming Robotics with Simulations: A Leap Towards Human-Like Dexterity

Machine Learning Applications – Robotics

The Humanoid Future of Robotics

Robotics, Soft Skills, Full Employment

The Role of Computer Vision in Robotics: Advancements, Applications, and Future Implications

Do humanoid robots need bodily sensory data?

Robotics and AI - What does the future hold?

?? Bonus: The freshest research papers from the week of Feb 19 — Feb 25

领英推荐

Enhancing Large Language Models (LLMs)

Multimodal and Multi-Agent Systems

Advancements in Specific Domains

Developer Tools and APIs

Security and Adversarial Research

Model Efficiency and Quantization

Instruction Tuning and Data Quality

Turing Post

2,111 位关注者

FOD#70: Lucky 70 (000)

2024年10月8日

????#2: Your Go-To Vocabulary to Navigate the World of AI Agents and Workflows

2024年10月6日

Topic 14: What are DoRA, QLoRA and QDoRA?

2024年10月3日

Guest post: Your infrastructure shouldn’t live in a 'black box'*

2024年10月2日

FOD#69: Why NotebookLM is blowing everyone’s minds – after a year since launch

2024年10月1日

Guest post: The Critical Role of VectorDBs in Building Intelligent AI Agents

2024年9月28日

Topic 13: What is OLMoE?

2024年9月26日

FOD#68: Vibe Check and Benchmarks: Are We Capable of Measuring AI Progress?

2024年9月24日

Generations Through AI's Lens

2024年9月22日

Concepts: Reinforcement Learning and Deep Learning on Flashcards

2024年9月19日

社区洞察

其他会员也浏览了

Weekly Robotics #242

2024: Year of Robots Transforming the World

AI Horizons: The Future of Humanoid Robotics with Figure 02

Transforming Robotics with Simulations: A Leap Towards Human-Like Dexterity

Machine Learning Applications – Robotics

The Humanoid Future of Robotics

Robotics, Soft Skills, Full Employment

The Role of Computer Vision in Robotics: Advancements, Applications, and Future Implications

Do humanoid robots need bodily sensory data?

Robotics and AI - What does the future hold?