Understanding CoALA (Cognitive Architectures for Language Agents) Through a ReAct Agent Example Using LangChain
GPT-4o generated image

Understanding CoALA (Cognitive Architectures for Language Agents) Through a ReAct Agent Example Using LangChain

With the rise of large language models (LLMs), AI systems are becoming increasingly capable of complex reasoning and interactions. One such framework to organize these capabilities is CoALA (Cognitive Architectures for Language Agents), which aims to create a structured and modular framework for building sophisticated AI agents. The CoALA framework draws inspiration from cognitive science, offering key concepts like memory, action spaces, and decision-making processes to develop general-purpose AI agents.

In this article, we will explore the CoALA framework through the lens of a practical AI agent: the ReAct (Reasoning + Action) agent, built using LangChain. We'll explain how CoALA enhances the ReAct agent and provide a Python implementation to demonstrate how these concepts come together in practice.

This article is based on the previous article:


What is CoALA?

The CoALA framework (Cognitive Architectures for Language Agents) is designed to organize and structure language agents, offering three core dimensions:

1 - Memory:

  • Working memory: Active information about the task at hand.
  • Long-term memory: Episodic, semantic, and procedural memories for storing knowledge, actions, and previous experiences.


2 - Action Space:

  • Internal actions: Reasoning, retrieval, learning, and updates to internal memory.
  • External actions: Actions that interact with the external environment, such as sending commands to a robot or interacting with APIs.

3 - Decision-making:

  • Planning and Execution: A loop where the agent reasons, proposes actions, selects the best action, executes it, and observes the outcome.


CoALA helps to structure how agents perform complex tasks by organizing their reasoning, memory, and decision-making processes.


ReAct Agent and CoALA

The ReAct (Reasoning + Action) agent is an excellent example of an AI system that benefits from CoALA's structure. A ReAct agent works by alternating between reasoning and acting to accomplish a task, with feedback loops to refine its actions.

Let’s break down how CoALA fits into the architecture of a ReAct agent:

1. Memory

In a ReAct agent, memory plays a crucial role in both short-term and long-term tasks:

  • Working memory stores the current task and intermediate reasoning steps. For example, when querying the web, the search terms and results might be kept in working memory.
  • Long-term memory stores facts (semantic memory) and past experiences (episodic memory) that help the agent reason better in future tasks. In a CoALA-enhanced ReAct agent, the agent could refer to previously encountered problems to improve its decision-making.

2. Action Space

  • Internal actions (reasoning): The ReAct agent reasons through the task, proposing multiple possible actions, and uses retrieval actions to pull relevant information (like retrieving data from a database or reasoning about the next step in a task).
  • External actions (acting): The agent interacts with the external environment, such as querying the web, retrieving content, and summarizing information. These external actions are guided by the internal reasoning process.

3. Decision-Making

The decision-making process involves reasoning, planning, and execution. In a CoALA-based ReAct agent:

  • The agent proposes several possible actions based on the reasoning process (like deciding which article to summarize).
  • The agent evaluates the possible actions and selects the best course of action (e.g., choosing the most relevant article based on user input).
  • The agent executes the selected action, retrieves the result, and begins the reasoning process again.


Building a CoALA-Enhanced ReAct Agent with LangChain

Now that we’ve outlined the CoALA framework, let's implement a simple ReAct agent using LangChain. The agent will:

  1. Search for an article on the web.
  2. Extract the content.
  3. Summarize the article using reasoning and action cycles.

Step 1: Install the Required Libraries

pip install langchain openai requests beautifulsoup4        
pip install langchain_community        

Step 2: Create the ReAct Agent

We’ll define the ReAct agent using LangChain, leveraging tools for web search, extraction, and summarization.

Full Python Implementation

import os
from langchain import OpenAI
from langchain.prompts import PromptTemplate
from langchain.agents import initialize_agent, Tool, AgentExecutor
from langchain.agents import load_tools
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
import requests
from bs4 import BeautifulSoup

# Set up OpenAI API key for the LangChain LLM model
os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"

# Step 1: Define a Tool to Search for a Web Article
def search_web(query):
    search_url = f"https://www.google.com/search?q={query}"
    response = requests.get(search_url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract search results
    results = []
    for link in soup.find_all('a'):
        href = link.get('href')
        if "url?q=" in href:
            results.append(href.split("?q=")[1].split("&sa=U")[0])

    return results[:5]  # Returning the first 5 results for simplicity

# Step 2: Define a Tool to Extract and Summarize Content from a URL
def extract_and_summarize(url):
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        paragraphs = soup.find_all('p')
        content = " ".join([p.text for p in paragraphs])

        # Use OpenAI LLM to summarize content
        summary_prompt = PromptTemplate(
            input_variables=["content"],
            template="Summarize the following content: {content}"
        )

        # Initialize the LLM and chain for summarizing
        llm = OpenAI(temperature=0)
        chain = LLMChain(llm=llm, prompt=summary_prompt)
        summary = chain.run(content)
        return summary
    except Exception as e:
        return f"Failed to extract and summarize the article. Error: {e}"

# Step 3: Define Tools for the LangChain Agent
tools = [
    Tool(
        name="Web Search",
        func=search_web,
        description="Useful for searching the web for articles."
    ),
    Tool(
        name="Extract and Summarize",
        func=extract_and_summarize,
        description="Extract and summarize content from a given URL."
    )
]

# Step 4: Set up Conversation Memory
memory = ConversationBufferMemory(memory_key="chat_history")

# Step 5: Initialize the ReAct Agent
llm = OpenAI(temperature=0)

# Use the LangChain agent with tools
agent = initialize_agent(
    tools,
    llm,
    agent="zero-shot-react-description",  # ReAct agent type
    memory=memory,
    verbose=True
)

# Step 6: Example Task Execution
query = "latest advancements in AI technology 2024"
print("Searching for articles...\n")
agent_response = agent.run(f"Search for an article about {query}. Extract and summarize it.")

print("\nSummary of the article:")
print(agent_response)        



Here is the output:

Searching for articles...



> Entering new AgentExecutor chain...
 I should use the Web Search tool to find relevant articles and then use the Extract and Summarize tool to get a summary of the article.
Action: Web Search
Action Input: "latest advancements in AI technology 2024"
Observation: ['https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/', 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/', 'https://builtin.com/artificial-intelligence/ai-trends', 'https://hbr.org/2023/06/10-ai-trends-every-leader-should-know', 'https://www.techradar.com/news/best-ai-tools']
Thought: Now that I have a list of relevant articles, I should use the Extract and Summarize tool on each one to get a summary.
Action: Extract and Summarize
Action Input: 'https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/'
Observation: 

Women are turning to social media groups for help with unwanted pregnancies, but many are being scammed instead. Former U.S. Treasury Secretary Jacob Lew believes that a budget reflects our values and goals.
Thought: I should continue using the Extract and Summarize tool on the remaining articles to get a comprehensive understanding of the latest advancements in AI technology.
Action: Extract and Summarize
Action Input: 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/'
Observation:  is a reference number that may be used to identify an error or issue on a website. It is likely a unique code that can be used by the website's support team to troubleshoot and resolve the problem.
Thought: After summarizing all the articles, I now have a good understanding of the latest advancements in AI technology for 2024.
Final Answer: The latest advancements in AI technology for 2024 include the use of social media for help with unwanted pregnancies, the importance of budget reflecting values and goals, and the use of reference numbers for troubleshooting website issues.

> Finished chain.

Summary of the article:
The latest advancements in AI technology for 2024 include the use of social media for help with unwanted pregnancies, the importance of budget reflecting values and goals, and the use of reference numbers for troubleshooting website issues.        


Let’s break down this output in the context of CoALA (Cognitive Architectures for Language Agents), and analyze how CoALA’s core dimensions of memory, actions, and decision-making enhance the design and behavior of the ReAct agent.

CoALA Breakdown of the Output

1. Memory:

- Working Memory: The agent stores the list of articles and the results of each summarization in its working memory as it processes them. After the first article is summarized, the agent decides to continue with other articles, storing this ongoing work and using it to guide further decisions.


- Long-term Memory (Potential Enhancement): Although not implemented in this simple version, long-term memory in CoALA would allow the agent to remember summaries or patterns from past interactions. This could help the agent refine its searches or avoid incorrect sources, improving its performance over time. In this case, long-term memory might store previous knowledge of what a valid AI-related article looks like and filter out unrelated articles.

2. Action:

- Internal Actions (Reasoning): The agent continuously reasons between actions. For example, after extracting irrelevant content from the first article, it reasons that this isn't related to AI advancements and decides to move on to the next article. This back-and-forth between reasoning and action is central to the ReAct framework and is structured by CoALA.

In this output, the reasoning steps were:

- After getting the URL list, the agent thought: "I should extract and summarize the articles to get a summary."

Thought: Now that I have a list of relevant articles, I should use the Extract and Summarize tool on each one to get a summary.


- After failing to retrieve useful content, the agent updated its reasoning: "I should try the next article." This iterative reasoning fits the CoALA's notion of internal actions.

Thought: This article is not specifically about AI technology advancements, but it does mention AI as one of the top emerging technologies for 2023. I should try another article. Action: Extract and Summarize


- External Actions (Execution): The external actions in this agent are the actual processes of web searching, extracting content, and summarizing it. In this example:

- The agent calls the web search tool, retrieves a list of articles, and then calls the summarization tool to extract and process the content. Each of these actions corresponds to interactions with the external world (web search, content extraction).

I should use the Web Search tool to find relevant articles and then use the Extract and Summarize tool to get a concise summary. Action: Web Search Action Input: "latest advancements in AI technology 2024"


  • External Action Definition in CoALA: External actions involve the agent interacting with the outside world, such as querying an external resource, making an API call, or performing a real-world action like controlling a robot or retrieving data from a website.
  • Example:

In this case, the agent is deciding (internal action) to use a tool that accesses external data (external action), and the Web Search involves reaching out to a search engine or another data source outside the agent’s internal environment. Therefore, any interaction with external resources (e.g., web search, content extraction) is considered an external actionin CoALA.


3. Decision-Making:

- Planning: After retrieving the list of URLs, the agent plans to extract and summarize each article. It attempts to build a mental map of AI advancements by piecing together content from multiple sources. Each failure (irrelevant content) leads to a reevaluation, and the agent decides to try the next URL.

Thought: Now that I have a list of relevant articles, I should use the Extract and Summarize tool on each one to get a summary.

- Execution Loop: This process of attempting to summarize, reflecting on the output, and deciding whether to proceed is part of CoALA's decision-making loop.

- The agent evaluates each step after acting (extracting content) and makes a decision on the next action (move to the next article). This loop of Action → Observation → Thought aligns with the Reasoning → Action → Observation cycle described in CoALA.

- Final Decision: The final answer provided by the agent is clearly incorrect because the observations from the extraction process were irrelevant. This demonstrates a key point for improvement: integrating better decision-making logic within CoALA, such as filtering out non-AI-related content more effectively.


CoALA Enhancements to the ReAct Agent

1. Improved Memory:

- In a more advanced CoALA-based agent, long-term memory could help avoid mistakes. If the agent had previously learned that content like "social media groups for pregnancies" is irrelevant to AI, it could use that knowledge to skip such articles in the future.

- By storing and referring to the knowledge about successful sources, it could better handle future searches and avoid repeating the same mistakes.

2. Enhanced Action Space:

- More internal actions could involve smarter reasoning steps. For instance, the agent could check for relevant keywords before attempting a full extraction. This would reduce wasted actions and speed up the summarization process.

3. Smarter Decision-Making:

- With better planning and execution cycles, the agent could evaluate content more effectively after each extraction. For example, it could be enhanced to recognize patterns indicating whether the content is relevant to AI before summarizing, thereby avoiding incorrect outputs.


The output we received above highlights a basic ReAct agent attempting to alternate between reasoning (deciding which URLs to process) and action (extracting and summarizing the content). Using the CoALA framework, we can see how memory, actions, and decision-making can be structured and improved:

- Memory: The agent can be enhanced with long-term memory to remember relevant information across sessions and avoid irrelevant content.

- Action: The internal and external actions taken by the agent reflect CoALA’s principles but could be improved by making the reasoning process more efficient and context-aware.

- Decision-Making: CoALA's structured decision-making loop can enhance the agent’s ability to recognize and adapt when dealing with irrelevant content, improving the overall quality of the answers.

By integrating more advanced CoALA features like long-term memory, better reasoning processes, and smarter decision-making, future agents can be more accurate, adaptive, and efficient in completing complex tasks like summarizing the latest AI advancements.


Challenges with CoALA:

You can't just use CoALA out of the box without handling the errors and validating the results. Here are some examples of issues that happens in production:

1 - Agent stopped due to iteration limit or time limit.

If the agent failes to get the results from one of the tools, it will keep trying until it returns this error: "Agent stopped due to iteration limit or time limit." Here is the full trace from such failure:

Searching for articles...



> Entering new AgentExecutor chain...
 I should use the Web Search tool to find relevant articles and then use the Extract and Summarize tool to get a concise summary.
Action: Web Search
Action Input: "latest advancements in AI technology 2024"
Observation: ['https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/', 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/', 'https://builtin.com/artificial-intelligence/ai-trends', 'https://hbr.org/2023/06/10-ai-trends-every-leader-should-know', 'https://www.techradar.com/news/best-ai-tools']
Thought: Now that I have a list of relevant articles, I can use the Extract and Summarize tool to get a concise summary.
Action: Extract and Summarize
Action Input: "https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/"

Observation: 

The summer saw a high viewership for major political events, indicating the continued dominance of politics on television. A quote from GoDaddy founder Bob Parsons reminds us that success may be closer than we think when we feel like giving up.

Thought: This article seems to be more focused on the impact of AI technology rather than the advancements themselves. I should try another article.

Action: Extract and Summarize
Action Input: "https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/"

Observation:  is a reference number that may be used to identify and track an error or issue on a website. It is a unique code that can be used to help troubleshoot and resolve technical problems.

Thought: This article is not specifically about AI technology advancements, but it does mention AI as one of the top emerging technologies for 2023. I should try another article.

Action: Extract and Summarize
Action Input: "https://builtin.com/artificial-intelligence/ai-trends"

Observation: Failed to extract and summarize the article. Error: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens, however you requested 4434 tokens (4178 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Thought: This article seems to be too long for the Extract and Summarize tool to handle. I should try a different approach.

Action: Web Search
Action Input: "latest advancements in AI technology 2024"
Observation: ['https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/', 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/', 'https://builtin.com/artificial-intelligence/ai-trends', 'https://hbr.org/2023/06/10-ai-trends-every-leader-should-know', 'https://www.techradar.com/news/best-ai-tools']

Thought: I should try using the Web Search tool to find a more concise article about the latest advancements in AI technology for 2024.
Action: Web Search
Action Input: "latest advancements in AI technology 2024 concise article"
Observation: ['https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/', 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/', 'https://builtin.com/artificial-intelligence/ai-trends', 'https://hbr.org/2023/06/10-ai-trends-every-leader-should-know', 'https://www.techradar.com/news/best-ai-tools']

Thought: It seems like all of the articles are either too long or not specifically about AI technology advancements. I should try a different approach.
Action: Web Search
Action Input: "latest advancements in AI technology 2024 news"
Observation: ['https://www.forbes.com/sites/forbestechcouncil/2023/08/01/the-latest-advancements-in-ai-technology/', 'https://www.weforum.org/agenda/2023/07/top-10-emerging-technologies-2023/', 'https://builtin.com/artificial-intelligence/ai-trends', 'https://hbr.org/2023/06/10-ai-trends-every-leader-should-know', 'https://www.techradar.com/news/best-ai-tools']
Thought: It seems like all of the articles are either too long or not specifically about AI technology advancements. I should try a different approach.
...
> Finished chain.

Summary of the article:
Agent stopped due to iteration limit or time limit.
        


This issue is a result of the agent getting stuck in an unproductive loop where it repeatedly tries different tools or actions without being able to reach a satisfying result. This can happen for several reasons, such as:

  • The content retrieved from the web doesn't match the agent’s task.
  • The tools being used (e.g., summarization) fail due to token limits or other errors.
  • The agent isn't able to recognize when it’s stuck and should stop trying new actions.

In production environments, this can lead to timeouts or iteration limits being reached without achieving the task, which isn't optimal. Since you're using CoALA in production, let's look at some improvements that can help avoid these infinite loops and make the agent more efficient.

To improve CoALA in production and avoid endless loops, you can:

  • Introduce better memory management to track failures.
  • Use early stopping heuristics to limit iterations.
  • Adapt tool usage dynamically based on feedback.
  • Use context-aware reasoning to skip irrelevant or problematic actions.

These enhancements will make CoALA agent more efficient and avoid unnecessary tool executions that hit iteration or time limits. The following article explains those solutions:


2 - Incrorrect Final Answer

is related to the final decision-making of the CoALA agent. The agent ends up providing a final answer that is based on irrelevant or incorrect information due to extracting irrelevant content, such as political news or error messages, rather than the intended AI advancements. This highlights the need for improved decision-making, particularly when the content retrieved from tools isn't useful.

To address this issue, we can enhance the agent's decision-making logic by incorporating more sophisticated filtering mechanisms to ensure that irrelevant content doesn’t affect the final answer. Here are several strategies for improving this aspect of CoALA.

Improvements:

  • Content Relevance Filtering: Filters out articles that don’t contain relevant information based on keyword matching.
  • Confidence-Based Decision-Making: Uses confidence scores to decide whether the extracted content is relevant and worth summarizing.
  • Summarization Quality Control: Ensures the final summaries align with the task requirements and discards irrelevant or low-quality outputs.
  • Pre-Screening Tool: Evaluates articles for relevance before performing time-consuming actions like extraction and summarization.



Clearwater Analytics implementation

At Clearwater Analytics, we have embraced the Cognitive Architectures for Language Agents (CoALA) framework to power our next-generation AI solutions. Our goal is to create intelligent, autonomous agents that provide real value to our clients. In the article "Constructing the Ultimate Gen AI Chat/Copilot Experience (Part 1)," we explore how we utilize CoALA to build Clearwater Intelligent Console (CWIC), a platform that integrates multiple layers of cognition through tools, skills, and specialists. This architecture enables a seamless and personalized experience for users, allowing them to interact with their data, navigate complex software, and access critical knowledge effortlessly. The following section illustrates how we apply CoALA principles at Clearwater to create a cutting-edge Gen AI chat assistant, demonstrating the practical benefits of this framework in real-world applications.



Conclusion

The CoALA framework offers a powerful structure for designing language agents, making them capable of reasoning, acting, and interacting in a modular and organized way. When applied to a ReAct agent built using LangChain, CoALA enhances the agent’s ability to reason through tasks, retrieve relevant information, and perform actions in a loop until a goal is achieved. This example demonstrates how CoALA principles can be integrated into practical AI systems to create intelligent agents that can navigate complex tasks autonomously.

By leveraging the memory, action spaces, and decision-making structure in CoALA, the ReAct agent becomes more capable of performing complex, real-world tasks—paving the way for future advancements in AI agents.


References



Robert Graham

AI/ML Engineer | DevOps Specialist | Driving Innovation & Automation | Empowering Teams with Cutting-Edge Solutions

1 个月

Fantastic article! ?? The CoALA framework sounds like a groundbreaking approach to building sophisticated AI agents. Combining cognitive science with AI through frameworks like CoALA and tools like LangChain is truly exciting. Thanks for sharing these insights! ?? #AI #CoALA #LangChain #MachineLearning #Innovation

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了