Multi-Agent Week Recap - MolMo Model, Letta, Core Bench.

Multi-Agent Week Recap - MolMo Model, Letta, Core Bench.

A recap of some announcements/papers in the last week (Sept 16 - 24) that I found interesting. Past news items (and other resources) at multiagentbook.com/news.


Molmo and PixMo by Ai2

Sep 24, 2024

Molmo is a new family of state-of-the-art open multimodal AI models that achieve performance comparable to proprietary systems like GPT-4 and Gemini, while being fully open-source.

IMO, what makes Molmo interesting is its novel training dataset - PixMo (Pixels for Molmo). Unlike conventional approaches that use noisy web-scraped image-text pairs, PixMo focuses on quality, using less than 1 million highly curated image-text pairs.

The key innovation is in how this data was collected: human annotators were asked to provide detailed 60-90 second spoken descriptions of images, which were then transcribed and refined. This 'modality switching trick' resulted in more detailed and high-quality descriptions than traditional text-based annotation methods. PixMo also includes diverse supervised datasets for fine-tuning, including a dataset that teaches models to interact with images by pointing. This enables Molmo to not just describe images verbally, but also to ground its understanding in precise pixel locations. This pointing capability opens up exciting new possibilities for interface agents that can address complex multi-step tasks by directly controlling interface elements (web, desktop, mobile UI).

MolMo enables pointing/localizing objects in images, counting

Demo | Paper

Authors: Matt Deitke Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, James Park, Reza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Ali Farhadi et al.



Letta: Memory Management Platform for Agentic AI

Sep 22, 2024


Letta introduces a memory management platform for the next generation of agentic systems, building on research from the MemGPT project at UC Berkeley. It offers a platform for developers to create and launch stateful agents and LLM APIs. Key features include an Agent Development Environment (ADE) for developing, debugging, and deploying stateful agents, Letta Cloud for hosted services, and an open-source framework continuing the MemGPT project. Letta emphasizes a model-agnostic approach, allowing flexibility to switch between LLM providers, and focuses on innovations in the stateful layer above base LLMs.

Its great to see more work on agents as api's and investment in low-code tools like AutoGen Studio.

Letta Announcement | Early Docs


CORE-Bench: A Benchmark for Computational Reproducibility Agents

Sep 16, 2024

Researchers from Princeton University have introduced CORE-Bench (Computational Reproducibility Agent Benchmark), a new benchmark designed to evaluate AI agents on computational reproducibility tasks. CORE-Bench consists of 270 tasks based on 90 scientific papers across computer science, social science, and medicine. The benchmark includes three difficulty levels and both language-only and vision-language tasks. The researchers also provide an evaluation system to measure agent accuracy efficiently. They evaluated two baseline agents: AutoGPT and a task-specific CORE-Agent, using GPT-4o and GPT-4o-mini as language models. The best agent achieved 21% accuracy on the hardest tasks, highlighting room for improvement in automating scientific reproducibility tasks (similar to other benchmarks). This benchmark aims to foster the development of AI agents that can aid in verifying and improving the reproducibility of scientific research.

Overall, its great to see more benchmarks focused on agentic workflows in addition to existing benchmarks like GAIA, WebArena, SWEBench, WindowsAgentArena etc.

See thread by one of the authors

Author Twitter Thread | Paper PDF

Authors: Zachary S. Siegel, Sayash Kapoor , Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan


Search view, filter previous multi-agent news snippets - multiagentbook.com/news.







要查看或添加评论,请登录

Victor Dibia, PhD的更多文章

社区洞察

其他会员也浏览了