Agents : It is a triad of...

...data, reasoning and action

I see a lot of ambiguity and confusion around agents of late. Some questions that I get are as below:

Is LLM the agent?
Is it the prompt which is the agent?
Should all agents have an LLM?        

Agents are not new, but with Generative AI it has become popular. With popularity, came the confusion around it. Many people are unclear about what an agent truly is. In this blog, I attempted to simplify the definition from my perspective.

What Is an Agent?

At its simplest, an agent is an entity that perceives its environment through data and compute, processes that information using reasoning, and performs actions to achieve specific goals. Agents can be software programs, robots, or any system capable of interacting with its environment autonomously. They've been around for decades, from simple rule-based systems to complex AI-driven applications.


Agent

The TRIADS of Agents

If we dissect the agents into their fundamental components, it becomes easier to understand them.

1. Data (Context)

Data serves as the eyes and ears of an agent. It represents the context or the information an agent collects from its environment. This data could be sensory input, user interactions, or any form of data that provides situational awareness.

  • Example: A weather forecasting agent collects data on temperature, humidity, and atmospheric pressure to predict future weather conditions.

Data is crucial because it sets the stage for everything that follows. Without accurate and relevant data, an agent cannot make informed decisions or perform meaningful actions.

2. Reasoning (Logic)

Reasoning is the brain of the agent. It involves understanding of the relevant data to make decisions. This component utilizes algorithms, rules, or models to interpret data(context) and determine the best course of action. This is the link between data(context) and action(work).

  • Example: A chess-playing agent uses reasoning to evaluate possible moves and select the most strategic one.

Traditionally, reasoning in agents was rule-based and static, relying on predefined if-then statements or decision trees. While effective for straightforward tasks, these agents, where reasoning was rule based, struggled with complexity and adaptability.

3. Action (Work)

Action is the agent's ability to interact with its environment. Based on the reasoning component's decisions, the agent performs tasks(enabled by tools) to achieve its objectives.

  • Example: A robotic vacuum cleaner moves around a room, cleaning the floor based on the area's layout and detected obstacles.

Actions are the tangible outputs of an agent's processes, closing the loop between perception (data) and effect (action).

With the advent of Gen AI, we can optionally provide a memory to the agent that helps it to continuously improve

Agents Are Not New

Agents have been part of the technological landscape for a long time. Early examples include:

  • Rule-Based Systems: These agents operate on a set of predefined rules. For instance, expert systems in medicine provided diagnoses based on symptom checklists.
  • Finite State Machines: Used in game development for character behavior, these agents transition between states based on input.
  • Simple Chatbots: Early chatbots could respond to specific keywords or entities but lacked understanding beyond their programming.

While these agents performed their intended tasks, they were limited by their static reasoning capabilities. They couldn't adapt to new information or handle unexpected scenarios effectively.

The Evolution of Reasoning with LLMs

The introduction of large language models (LLMs) like GPT-4 and more recently the strawberry series from Open AI has significantly transformed the reasoning component of agents. Here's how:

Adaptive Reasoning

LLMs enable agents to have adaptive reasoning, meaning they are trained to be able to plan and reason to address dynamic scenarios which are not scripted by rules.

  • Example: Customer service chatbots powered by LLMs can handle a wide range of queries, learn from interactions, and improve over time without explicit reprogramming.

Adaptive reasoning allows agents to perform well even when faced with situations they weren't explicitly programmed to handle.

Natural Language Understanding

LLMs excel at understanding and generating human-like language, enhancing the agent's ability to interpret context and nuance.

This natural language capability makes interactions with agents more intuitive and effective.

Contextual Awareness

With LLMs, agents can maintain context over extended interactions, providing more coherent and relevant responses.

  • Example: An AI tutor can remember a student's previous questions and tailor explanations to their learning style.

Contextual awareness improves the quality of the agent's actions, making them more useful and user-friendly.

Impact on the Three Components

Enhanced Data Processing

LLMs allow agents to process unstructured data like text, speech, and images more effectively, expanding the range of data sources they can utilize.

  • Example: Sentiment analysis tools can interpret customer emotions from text inputs to improve service delivery.

Advanced Reasoning

The reasoning component becomes more powerful and flexible with LLMs, capable of handling complex decision-making processes.

  • Example: Financial agents can analyze market trends and news articles to make investment recommendations.

Improved Actions

With better data and reasoning, the actions performed by agents become more accurate and effective.

  • Example: A LLM powered agent can make use of a math tool to perform quantitive operations.

Conclusion

While agents are not a new concept, the advent of LLMs has revolutionized the reasoning component, making agents more adaptive and less rule based. This shift enables agents to handle complex tasks, learn from new data, and provide more accurate and personalized actions.

要查看或添加评论,请登录