登录查看更多内容

Multi-Agent Week Recap - MolMo Model, Letta, Core Bench.

Victor Dibia, PhD

Principal RDSE at Microsoft Research (Generative AI, Agents) | Carnegie Mellon Alumnus

发布日期: 2024年10月1日

A recap of some announcements/papers in the last week (Sept 16 - 24) that I found interesting. Past news items (and other resources) at multiagentbook.com/news.

Molmo and PixMo by Ai2

Sep 24, 2024

Molmo is a new family of state-of-the-art open multimodal AI models that achieve performance comparable to proprietary systems like GPT-4 and Gemini, while being fully open-source.

IMO, what makes Molmo interesting is its novel training dataset - PixMo (Pixels for Molmo). Unlike conventional approaches that use noisy web-scraped image-text pairs, PixMo focuses on quality, using less than 1 million highly curated image-text pairs.

The key innovation is in how this data was collected: human annotators were asked to provide detailed 60-90 second spoken descriptions of images, which were then transcribed and refined. This 'modality switching trick' resulted in more detailed and high-quality descriptions than traditional text-based annotation methods. PixMo also includes diverse supervised datasets for fine-tuning, including a dataset that teaches models to interact with images by pointing. This enables Molmo to not just describe images verbally, but also to ground its understanding in precise pixel locations. This pointing capability opens up exciting new possibilities for interface agents that can address complex multi-step tasks by directly controlling interface elements (web, desktop, mobile UI).

MolMo enables pointing/localizing objects in images, counting

Demo | Paper

Authors: Matt Deitke Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, James Park, Reza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Ali Farhadi et al.

Letta: Memory Management Platform for Agentic AI

Sep 22, 2024

Letta introduces a memory management platform for the next generation of agentic systems, building on research from the MemGPT project at UC Berkeley. It offers a platform for developers to create and launch stateful agents and LLM APIs. Key features include an Agent Development Environment (ADE) for developing, debugging, and deploying stateful agents, Letta Cloud for hosted services, and an open-source framework continuing the MemGPT project. Letta emphasizes a model-agnostic approach, allowing flexibility to switch between LLM providers, and focuses on innovations in the stateful layer above base LLMs.

Its great to see more work on agents as api's and investment in low-code tools like AutoGen Studio.

领英推荐

PhD-Level AI Agents, Exponential Moving Averages…

Open Data Science Conference (ODSC) 1 个月前

LLMs for Simulated User Feedback, Causal AI, AI Slide…

Open Data Science Conference (ODSC) 9 个月前

??Top ML Papers of the Week

DAIR.AI 9 个月前

Letta Announcement | Early Docs

CORE-Bench: A Benchmark for Computational Reproducibility Agents

Sep 16, 2024

Researchers from Princeton University have introduced CORE-Bench (Computational Reproducibility Agent Benchmark), a new benchmark designed to evaluate AI agents on computational reproducibility tasks. CORE-Bench consists of 270 tasks based on 90 scientific papers across computer science, social science, and medicine. The benchmark includes three difficulty levels and both language-only and vision-language tasks. The researchers also provide an evaluation system to measure agent accuracy efficiently. They evaluated two baseline agents: AutoGPT and a task-specific CORE-Agent, using GPT-4o and GPT-4o-mini as language models. The best agent achieved 21% accuracy on the hardest tasks, highlighting room for improvement in automating scientific reproducibility tasks (similar to other benchmarks). This benchmark aims to foster the development of AI agents that can aid in verifying and improving the reproducibility of scientific research.

Overall, its great to see more benchmarks focused on agentic workflows in addition to existing benchmarks like GAIA, WebArena, SWEBench, WindowsAgentArena etc.

Author Twitter Thread | Paper PDF

Authors: Zachary S. Siegel, Sayash Kapoor , Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan

Search view, filter previous multi-agent news snippets - multiagentbook.com/news.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Designing with ML/AI

3,701 位关注者

要查看或添加评论，请登录

Victor Dibia, PhD的更多文章

AutoGen Studio v0.4.1 Release Notes: Declarative Configuration, Team Testing, and Enhanced Agent Gallery

2025年2月11日

AutoGen Studio v0.4.1 Release Notes: Declarative Configuration, Team Testing, and Enhanced Agent Gallery

We just released v0.4.

10 条评论
New AutoGen Release - v0.4.4: Serializable Agent Configuration, Support for Azure Hosted Models and More

2025年1月29日

New AutoGen Release - v0.4.4: Serializable Agent Configuration, Support for Azure Hosted Models and More

We just released v0.4.

4 条评论
AI Agents 2024 Rewind - A Year of Building and Learning

2025年1月8日

AI Agents 2024 Rewind - A Year of Building and Learning

2024 was quite an eventful year for generative AI and agents! I spent sometime curating the most interesting updates I…

4 条评论
Multi-Agent Week Recap - π0, OmniParser, ARIA, Anthropic Computer Use, OpenAI Swarm ..

2024年11月1日

Multi-Agent Week Recap - π0, OmniParser, ARIA, Anthropic Computer Use, OpenAI Swarm ..

Another light-weight recap of some of the rather interesting things that have happened in the multi-agent space in the…

3 条评论
Using LLMs as Context-Aware Text Embedding Models - NV-Embed Paper Review

2024年10月9日

Using LLMs as Context-Aware Text Embedding Models - NV-Embed Paper Review

Can you harness the immense language understanding capabilities of generative models (e.g.
Multi-Agent Week Recap - Microsoft Copilot Agents, Salesforce Agentforce, WindowsArena, Paper2QA

2024年9月16日

Multi-Agent Week Recap - Microsoft Copilot Agents, Salesforce Agentforce, WindowsArena, Paper2QA

The last week (Sept 8 - 16) has been eventful wrt to agents and multi-agent systems with a few interesting…

2 条评论
How will AI Impact Software Engineering?

2024年8月22日

How will AI Impact Software Engineering?

How will software engineering change in the age of strong AI models that can write code and what should individual…

15 条评论
Announcing A New Book - Multi-Agent Systems with AutoGen!

2024年7月19日

Announcing A New Book - Multi-Agent Systems with AutoGen!

I'm excited to announce a significant milestone in a project I've been working on: I'm writing a book titled…

89 条评论
Introducing Anomagram?-?An Interactive Visualization of Autoencoders Applied to the Task of Anomaly Detection.

2020年1月8日

Introducing Anomagram?-?An Interactive Visualization of Autoencoders Applied to the Task of Anomaly Detection.

Across many business use cases that generate data, it is frequently desirable to automatically identify data samples…

3 条评论
All I learned about Social Science Research Writing, I learned from Blogging.

2014年12月28日

All I learned about Social Science Research Writing, I learned from Blogging.

Photo Credit : Theslaeslion By the middle of my first year in a Social Science PhD program (Information Systems), I…

8 条评论

See all articles

Multi-Agent Week Recap - MolMo Model, Letta, Core Bench.

Victor Dibia, PhD

Principal RDSE at Microsoft Research (Generative AI, Agents) | Carnegie Mellon Alumnus

Molmo and PixMo by Ai2

Letta: Memory Management Platform for Agentic AI

领英推荐

CORE-Bench: A Benchmark for Computational Reproducibility Agents

Designing with ML/AI

3,701 位关注者

Victor Dibia, PhD的更多文章

社区洞察

其他会员也浏览了

6-Tier MHDF Core AI with Nested Right & Left 6-Tier MHDF Knowledge Files and Left-Handed & Right-Handed Clone AIs

Precision is Power: Shakti’s Blueprint for AI Excellence

AI at scale: Managing ML models over time & across use cases

Data-Centric AI > Model-Centric AI

GenAI Weekly — Edition 25

Beginner's Guide to Vector Embeddings

Explaining the Methodology Behind DeepSeek-R1

DeepSeek R1 Deconstructed: How it was built and How it Operates as it does

AI, Test Right

Molmo and PixMo by Ai2

Letta: Memory Management Platform for Agentic AI

领英推荐

CORE-Bench: A Benchmark for Computational Reproducibility Agents

Designing with ML/AI

3,701 位关注者

Victor Dibia, PhD的更多文章

AutoGen Studio v0.4.1 Release Notes: Declarative Configuration, Team Testing, and Enhanced Agent Gallery

New AutoGen Release - v0.4.4: Serializable Agent Configuration, Support for Azure Hosted Models and More

AI Agents 2024 Rewind - A Year of Building and Learning

Multi-Agent Week Recap - π0, OmniParser, ARIA, Anthropic Computer Use, OpenAI Swarm ..

Using LLMs as Context-Aware Text Embedding Models - NV-Embed Paper Review

Multi-Agent Week Recap - Microsoft Copilot Agents, Salesforce Agentforce, WindowsArena, Paper2QA

How will AI Impact Software Engineering?

Announcing A New Book - Multi-Agent Systems with AutoGen!

Introducing Anomagram?-?An Interactive Visualization of Autoencoders Applied to the Task of Anomaly Detection.

All I learned about Social Science Research Writing, I learned from Blogging.

社区洞察

其他会员也浏览了

6-Tier MHDF Core AI with Nested Right & Left 6-Tier MHDF Knowledge Files and Left-Handed & Right-Handed Clone AIs

Precision is Power: Shakti’s Blueprint for AI Excellence

AI at scale: Managing ML models over time & across use cases

Data-Centric AI > Model-Centric AI

GenAI Weekly — Edition 25

Beginner's Guide to Vector Embeddings

Explaining the Methodology Behind DeepSeek-R1

DeepSeek R1 Deconstructed: How it was built and How it Operates as it does

AI, Test Right