Multi-Agent Week Recap - MolMo Model, Letta, Core Bench.
Victor Dibia, PhD
Principal RDSE at Microsoft Research (Generative AI, Agents) | Carnegie Mellon Alumnus
A recap of some announcements/papers in the last week (Sept 16 - 24) that I found interesting. Past news items (and other resources) at multiagentbook.com/news.
Molmo and PixMo by Ai2
Sep 24, 2024
Molmo is a new family of state-of-the-art open multimodal AI models that achieve performance comparable to proprietary systems like GPT-4 and Gemini, while being fully open-source.
IMO, what makes Molmo interesting is its novel training dataset
The key innovation is in how this data was collected: human annotators
Authors: Matt Deitke Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, James Park, Reza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Ali Farhadi et al.
Letta: Memory Management Platform for Agentic AI
Sep 22, 2024
Letta introduces a memory management platform for the next generation of agentic systems, building on research from the MemGPT project at UC Berkeley. It offers a platform for developers to create and launch stateful agents
Its great to see more work on agents as api's and investment in low-code tools like AutoGen Studio.
领英推荐
CORE-Bench: A Benchmark for Computational Reproducibility Agents
Sep 16, 2024
Researchers from Princeton University have introduced CORE-Bench (Computational Reproducibility Agent Benchmark), a new benchmark designed to evaluate AI agents on computational reproducibility tasks. CORE-Bench consists of 270 tasks based on 90 scientific papers across computer science, social science, and medicine. The benchmark includes three difficulty levels and both language-only and vision-language tasks. The researchers also provide an evaluation system to measure agent accuracy efficiently. They evaluated two baseline agents: AutoGPT and a task-specific CORE-Agent, using GPT-4o and GPT-4o-mini as language models. The best agent achieved 21% accuracy on the hardest tasks, highlighting room for improvement in automating scientific reproducibility tasks (similar to other benchmarks). This benchmark aims to foster the development of AI agents that can aid in verifying and improving the reproducibility of scientific research.
Overall, its great to see more benchmarks focused on agentic workflows
Authors: Zachary S. Siegel, Sayash Kapoor , Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan
Search view, filter previous multi-agent news snippets - multiagentbook.com/news.