Multi-Agent Week Recap - Microsoft Copilot Agents, Salesforce Agentforce, WindowsArena, Paper2QA
Sept 8 - 16 saw the a few multi-agent announcements.

Multi-Agent Week Recap - Microsoft Copilot Agents, Salesforce Agentforce, WindowsArena, Paper2QA

The last week (Sept 8 - 16) has been eventful wrt to agents and multi-agent systems with a few interesting announcements.

Learn more at multiagentbook.com/news.


Microsoft 365 Copilot Wave 2: Pages, Python in Excel, and agents

微软 has launched the next wave of Microsoft 365 Copilot, introducing significant updates to enhance AI-powered productivity.


Unveiling Copilot agents built with Microsoft Copilot Studio to supercharge your business.

Key features:

  • Copilot Pages: A dynamic, persistent canvas for multiplayer AI collaboration, designed as the first new digital artifact for the AI age.
  • Major improvements to Copilot in Microsoft 365 apps: Advanced data analysis in Excel with Python support Dynamic storytelling in PowerPoint Enhanced meeting capabilities in Teams
  • Introduction of Copilot agents built in Copilot Studio: Automate and execute business processes, enabling teams to scale their efforts.
  • New agent builder: Allows creating custom Copilot agents right in BizChat or SharePoint.

These enhancements are based on feedback from nearly 1,000 customers, resulting in over 700 product updates and 150 new features. The release blog post is chuck full of details including speed and customer satisfaction updates

Copilot responses are more than two times faster on average, and response satisfaction has improved by nearly three times.

IMO, this update represents a significant step towards integrating AI agents into everyday productivity tools, potentially transforming how we collaborate and work with AI assistance.

Salesforce Agentforce

Salesforce is introducing Agentforce, ushering in what they call the "third wave" of the AI revolution. Agentforce represents a shift towards autonomous AI agents capable of reasoning and tackling multi-faceted projects with minimal human oversight.

Key points:

  • Built on the Salesforce Platform, leveraging existing customization capabilities
  • Includes ready-to-deploy agents like Sales Development Rep (SDR) and Service Agent
  • Utilizes the Einstein Trust Layer for secure and responsible AI deployment
  • Aims to free up human workers for more strategic, relationship-building tasks
  • Customizable through low-code Agent Builder, allowing organizations to tailor agents to specific needs

IMO, efforts like this represent careful rollouts of pipelines where LLMs drive some controlled actions. We will need advances in model performance, architecture, lots of engineering/experimentation to achieve value from more autonomous deployments.

Windows Agent Arena


Some of my colleagues 微软 recently released the Windows Agent Arena, an open-source benchmark for developing and testing AI agents on Windows operating systems.

Highlights:

  • Provides a scalable framework for evaluating AI agents that can reason, plan, and act on a PC
  • Includes over 150 agent tasks across various applications and domains
  • Allows for parallelized evaluation in Azure, significantly speeding up testing
  • Uses Omniparser to process screenshots and GPT-4V for decision-making
  • Current best agent solves 19.5% of tasks, compared to 74.5% human performance


IMO, this project provides critical infrastructure for benchmarking interface agents - agents that address tasks by driving UI interfaces designed for humans.

PaperQA2

PaperQA2 is an advanced AI agent designed for conducting comprehensive scientific literature reviews autonomously, reportedly outperforming PhD and postdoc-level researchers in biology.

Key features:

  • Capable of finding, summarizing, and synthesizing relevant scientific literature
  • Refines search parameters based on initial findings
  • Provides cited, factually grounded answers
  • Achieves state-of-the-art performance on LitQA2, part of the LAB-Bench benchmark
  • Open-sourced code available for further research and development
  • Represents a significant step towards AI-assisted scientific research

These developments showcase the rapid advancement of AI agents across different domains, from business operations to scientific research, indicating a trend towards more autonomous and capable AI systems in various fields.


OpenAI o1: A New Era of AI Reasoning

And ofcourse - OpenAI introduced a new series of AI models called OpenAI o1, designed to excel at complex reasoning tasks. The first model in this series, o1-preview, is now available in ChatGPT and via API access.

Key features of OpenAI o1:

  1. Enhanced reasoning: The models are trained to spend more time thinking through problems, refining their thought processes, and recognizing mistakes.
  2. Improved performance: In tests, the upcoming model update performs similarly to PhD students on challenging tasks in physics, chemistry, and biology. It also shows exceptional abilities in math and coding.
  3. Safety focus: A new safety training approach leverages the model's reasoning capabilities to better adhere to safety and alignment guidelines.
  4. Collaboration with safety institutes: OpenAI has formalized agreements with U.S. and U.K. AI Safety Institutes, granting them early access for evaluation and testing.
  5. Specialized versions: Along with o1-preview, OpenAI is releasing o1-mini, a faster and cheaper model optimized for coding tasks.

IMO, Models like this could have significant implications for multi-agent systems - better planning, better reasoning, less dumb mistakes.


Learn more at multiagentbook.com/news.




Emeka Okoye

Knowledge Engineer | Al Engineer | Ontologist | Semantic Architect | Knowledge Graph Engineer | Information Architect | Python AI

6 个月

Thanks for this. I find them useful. Keep it up. BTW I am passionate about agentic frameworks and task-oriented architecture.

回复
Victor Dibia, PhD

Principal RDSE at Microsoft Research (Generative AI, Agents) | Carnegie Mellon Alumnus

6 个月

要查看或添加评论,请登录

Victor Dibia, PhD的更多文章

社区洞察

其他会员也浏览了