Frontiers of Software Architecture: Breaking Down NVIDIA’s Software AI Innovation - Voyager

In late May, a team at NVIDIA published a paper (and source code) titled: VOYAGER: An Open-Ended Embodied Agent with Large Language Models.? Voyager is an AI model designed to continuously explore the world of Minecraft, evolve its own software code, and improve its performance over time through the development of “skills” informed by observations it makes about its environment and feedback it gathers by testing those skills.

Minecraft is a popular sandbox video game that allows players to build and explore virtual worlds using different types of blocks. It offers multiple game modes, including survival mode where players must gather resources to build the world and maintain health, and creative mode where players have unlimited resources.

Voyager shares some similarities with agents that operate in a traditional reinforcement learning software paradigm, as it is able to perform actions within an environment and refine its skills based on feedback from the environment state. However, it also introduces major conceptual innovations to software architecture: (1) Voyager can evolve its own code to create a diverse set of actions, facilitated by its skill library where it stores and retrieves complex behaviors, leading to an unbounded action space. (2) Instead of relying on a reward function to guide its actions, Voyager uses a separate agent called a CurriculumAgent that provides an automatic curriculum of tasks with the goal of helping Voyager to “become the best Minecraft player in the world”. (3) Voyager leverages Large Language Models (in this case primarily GPT-4) as a long-term knowledge store and to generate tasks and code, while also maintaining its own state, including a skills library, records of successfully completed and failed tasks, and recent knowledge acquired.

Voyager's approach is illustrative of how software can be designed to leverage a large language model (LLM) in diverse ways, extending beyond the simple chatbots and content summarization use cases to include task and code generation, and state maintenance.

Breaking Down Voyager

High Level Architecture

At a high level the Voyager system interacts with a variety of remote dependencies, files in the local filesystem and vector datastores.

  • Minecraft:? Microsoft provides instances of the Minecraft environment that Voyager interacts with using the Mineflayer Javascript API. This API allows Voyager to control the player's actions, such as moving, building, and mining, and receive updates about its surroundings.
  • OpenAI:? OpenAI, specifically GPT-4, is a crucial resource for Voyager. Leveraging GPT-4's extensive knowledge of Minecraft, Voyager's agents can suggest tasks, generate code, and provide coaching to help Voyager evolve and create new Javascript skills to improve its performance in the game.? Additionally OpenAI’s text embeddings are required to encode content to store in the vector databases for similarity searches.
  • Vector Databases: Voyager uses vector databases to store the embeddings of the skills it learns. These embeddings are used for similarity searches when Voyager needs to retrieve a skill.
  • JSON/Javascript Files: Voyager uses various JSON and Javascript files to store, update, and retrieve semi-structured information. This includes records of tasks that have been completed or failed, and the Javascript code for the skills it has learned. These files are crucial for maintaining Voyager's state and facilitating its continuous learning process.

No alt text provided for this image

Voyager Software Components

Software Overview

Voyager is composed of multiple components that are primarily different types of helper agents that get their instructions from pre-authored prompt templates (see Prompt Templates below) that leverage OpenAI’s large language models (LLMs) to perform some knowledge related task and then chains the output of those tasks with instructions encoded using traditional programming methods.? The recently popularized Python Langchain framework is used to manage prompts and LLM usage.

For instance, Voyager’s ActionAgent has a prompt template that will instruct the OpenAI large language model to build new Javascript skills given a certain context.? It will follow static Python instructions to connect with the Minecraft server to try the newly created Javascript based skill.

The major agents that Voyager defines and relies on are:

  • CurriculumAgent: Voyager is inherently motivated to become “the best Minecraft player in the world”.? It relies on the CurriculumAgent to propose next tasks by having OpenAI offer ideas given the current state of play.? (Check out the prompt templates associated with the CirriculumAgent linked below to get a sense of the multi-step process the agent utilizes).
  • SkillsManager:? Once Voyager understands the next task it needs to complete, it will look up the relevant Javascript to execute within its current set of skills.? It does this by performing a similarity search within the skills vector database for skills closely matching the high level description of how the CurriculumAgent suggests the task should be completed.
  • ActionAgent:? The ActionAgent takes the skill and associated Javascript code closely matching the task that needs to be completed and attempts to improve its Javascript code using the ActionAgent prompt templates with OpenAI’s code generation capabilities.? It then attempts to execute this code in the Minecraft environment.?
  • CriticAgent:? The outcome of the Javascript code execution is forwarded to the CriticAgent to determine success or failure on if the task was successfully completed.? It will update its task datastores accordingly which will serve as input to the CurriculumAgent and improve its ability to propose a next level task considering the current state of the environment.

If the task is successfully completed, the Javascript code is added to Voyager’s skills repository by the Skill’s Manager.? At the end of every step, the loop starts again with the CurriculumAgent selecting a new proposed task for Voyager to pursue.

Detailed software component and sequence diagrams below provide further detail on how Voyager’s software is designed.

Voyager Components

No alt text provided for this image

Voyager Main Loop

No alt text provided for this image

Prompt Templates

The agents that Voyager depends on themselves are as much programmed using plain English Prompt Templates as they are the Python programming language. Think of Prompt Templates as providing scripts for?each agent to act out a certain responsibility. While agents have a certain amount of freedom to carry out their work, Prompt Templates can provide detailed instructions, guardrails and strict structure to what they produce. It is worthwhile reviewing each Prompt Template in further detail to get a better sense of agent behavior:

CurriculumAgent Prompts:

SkillsManager Prompts:

ActionAgent Prompts: Action and Response Prompt Templates

CriticAgent Prompts:

Summary

Voyager's approach demonstrates how LLMs can be integrated into a system to drive dynamic behavior and continuous learning. The use of pre-authored prompt templates and the Python Langchain framework to manage interactions with the LLMs shows how traditional programming methods can be combined with AI to create more flexible and adaptive systems.

The agent-based architecture of Voyager, with specialized agents responsible for different aspects of the learning process, presents a modular approach that could be applied in other contexts. This modularity allows for the separation of concerns, making the system more manageable and scalable. Each agent can be developed, tested, and improved independently, and new agents can be added as needed to extend the system's capabilities.

In terms of broader applications, this approach could be used in any situation where a system needs to learn and adapt over time based on its interactions with a complex environment. This could include other game environments, virtual simulations, or real-world applications such as autonomous vehicles or robotics. The ability to generate and refine code, propose tasks, and assess outcomes could be particularly useful in situations where the system needs to operate autonomously for extended periods, continually improving its performance without human intervention.

Todd Cullen

Senior Director @ Northwestern Mutual | Technology Strategy

1 年

Thanks for sharing this. Is this conceptually the same as how AlphaGo was designed by DeepMind? That is a board game vs. an open ended world so the "rules" the prompt templates leverage against the LLM are quite different. Very interesting to think how "self learning" software might be applied towards consumer facing services/products (not games).

要查看或添加评论,请登录

Tom Glaser的更多文章

社区洞察

其他会员也浏览了