The future of Generative AI Frontier: The Power of Agents
Suresh Bansal
Technical Manager (Generative AI, Vector DB, LLM, Hugging Face, Lang Chain, Llama Index, Azure & AWS) | XLRI
Background
The era of AI / ML began when we started training computers using data instead of code but during this time the applications could only perform the task for which it was trained such as classification, object identification etc. Then, at the end of 2022 OpenAI released ChatGPT which could generate content and perform a wide range of tasks. It quickly caught the attention of millions of users worldwide and became a rage. According to Gartner hype cycle for artificial intelligence, 2023 Generative AI is at peak of inflated expectations and is expected to reach ‘Plateau of productivity’ in 5 – 10 years.
Limitations
Current LLM’s (Large language model) are good at multiple tasks such as generating emails, essays, sentiment analysis etc. but it is not very good at certain other tasks such as math calculations, or multi-step complex problems. Current LLM models suffer from variety of other limitations such as
·?Hallucinations or misleading outputs
·?Technical limitations such as limited context length and memory
·?Bias in output
·?Toxic or harmful speech
·?Limited knowledge - ChatGPT 3.5 has knowledge cut off September 2021
But if we come to think of it, we humans also have similar challenges. We are prone to giving out false information (intentionally or unintentionally), suffer from bias, have limited knowledge, memory, and may even give out harmful responses. So, how do we go about managing these shortcomings?
1.?We look for information on the internet and use other tools such as excel, word etc.
2.?We revise our work again and again to fix errors and enhance it till we are satisfied with the output.
3.?We seek feedback from peers & mentors and incorporate the same.
4.?We work in teams and collaborate with each other.
Therefore, we can use similar concepts to improve outputs from LLM’s, which will take us to the concept of Agents.
Generative AI Agents
Agents execute complex tasks that combine LLM’s with key modules such as memory, planning and access to tools. Here LLM serves as the brain of the agent to control flow of operations using memory and various tools to perform identified tasks.
Key features of an agent are.
·?Plan and execute tasks.
·?Reflect on outcomes.
·?Use tools to accomplish specified goals.
·?With no or little human intervention.
Some examples of agents can be.
·?Web site builders based on certain inputs and prompts.
·?Data analyst to give data insights from data in an excel sheet.
·?Travel agent to plan out weekend travel for certain number of days in a specified city.
Tools
We perform tasks using tools such as internet browser, word, excel, application etc.
Similarly in the world of generative AI, tools correspond to a set of enablers for an LLM agent to interact with external environments / applications such as Internet search, Wikipedia search, code interpreter, and math engine. Tools can also access databases, knowledge bases, and external models.
For example, Travel agent will need tools such as mentioned below to perform its tasks.
·?Search flights.
·?Book flights.
·?Search the internet.
Below diagram lists out few other tools which can be handy for an agent based on objectives or goals given.
·?Entity Extraction: Extract specific information from an unstructured document such as total price, date, customer name from an invoice. Source documents need not be in a consistent format.
·?Chat DB: Business users can get the required information from a database without SQL or DB schema knowledge.
·?Knowledge Bot: Utilizes RAG (Retrieval augmented generation) to answer questions based on custom knowledge repository. This repository can be based on unstructured data sources such as documents and files.
·?Internet Search: Extract key words from user queries and fetch content from internet using any of the available search engines such as google, Bing or DuckDuckGo.
·?Summarization: Get summary of large documents from the perspective of a specific persona such as CEO or CFO.
·?Program Execution: Utilizes PAL (Program aided language model) to write and execute python code to generate answers for a specific problem.
·?Wikipedia Search: Extract key words from user queries and fetch content from Wikipedia.
·?Comparison: Answer comparison questions such as company performance in this quarter Vs last quarter, or best mobile phone under Rs. 10,000, or best performing mutual fund scheme in equity space etc.
领英推荐
Agentic Design Patterns
Below are the few agentic design patterns based from lectures by from Andrew NG.
1.?Reflection: The LLM examines its own work to come up with ways to improve it. The crux of Reflection is that the model criticizes its own output to improve its response. Further, we can implement Reflection using a multi-agent framework where we create two different agents, one prompted to generate good outputs and the other prompted to give constructive criticism. The resulting discussion between the two agents leads to improved responses.
2.?Tool use: The LLM is given tools such as
a.?Web search - {tool: web-search, query: "coffee maker reviews"}
b.?Code execution - {tool: python-interpreter, code: "100 (1+0.07)*12"}
or any other function to help it gather information, act, or process data.
3.?Planning: The LLM comes up with a multistep plan to achieve a goal, and then executes it. For example,
a.?Writing an outline for an essay
b.?Doing online research
c.?Writing a draft, and so on.
4.?Multi-agent collaboration: More than one AI agent works together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would. Given a complex task like writing software, a multi-agent approach would break down the task into subtasks to be executed by different roles — such as a software engineer, product manager, designer, QA (quality assurance) engineer, and so on — and have different agents accomplish different subtasks.
While the first two design patterns give predictable outcomes, the last two design patterns are more in experimental phase.
LLM Agent Framework
Now that we understand agents, tools, and agentic design patterns, we can talk about a variation of Planning design pattern. At a high level it works by defining a task or goal and then asking these two questions followed by a feedback loop.
1.?Planning: What should be the next action.
2.?Action: Executing using a router agent and tools.
An LLM agent consists of the following core components:
·?Short-term memory?stores context information which is finite due to the constraint of context window which can be passed to an LLM.
·?Long-term memory?is an external vector store to provide relevant contextual information to the agent.
In ‘Planning’ agentic design pattern agent comes up with a multistep plan to achieve a goal, and then executes it. This pattern is a slight variation wherein instead of thinking about all the sequence of steps the LLM just plans and executes the very next step and iterates repeatedly till the goal is achieved. Narrative of the diagram below.
Flow Narrative
1. Problem / Query is given by the user
2. What should be the next action? Agent looks at the query and identifies the immediate next step.
3. Human in Loop (optional) - User can see and refine the next step planned by the agent.
4. Refine the task, Additional Inputs – User can either refine the next step planned by the agent or provide additional information required for the query.
5. Router Agent / Tools – Router agent has a list of tools available along with descriptions for each of the tools. This description is used to identify the right tool for the task and use the same to get results.
6. Goal Accomplished? – Verify if the user query is answered. If No, then go back to the agent iteratively for the next step or else return the final answer to the user.
The Diverse Landscape of Agents
The realm of Generative AI Agents is far from monolithic. Here's a glimpse into the different types of agents:
Conclusion
AI Agent interface is like Jarvis in Iron Man. Jarvis does not replace Tony Stark. It just gives better tools to perform the tasks more efficiently.
One section of this blog is generated by ChatGPT3.5. Can you guess? ??
References / Further Readings
1.?LLM Agents: https://www.promptingguide.ai/research/llm-agents
2.?TedxPSUT: Generative AI is just the beginning AI agents are what comes next: Daud Abdel Hadi https://www.youtube.com/watch?v=z7-fPFtgRE4
3.?Andrew NG @Sequoia Capital: What’s next for AI agentic workflows https://www.youtube.com/watch?v=sal78ACtGTc
4.?Agentic Design Patterns Part 1: https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
5.?Agentic Design Patterns Part 2, Reflection: https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-2-reflection/
6.?Agentic Design Patterns Part 3, Tool Use: https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-3-tool-use/
7.?Agentic Design Patterns Part 4, Planning: https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/
8.?Agentic Design Patterns Part 5, Multi-Agent Collaboration: https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration
9.?What’s New in Artificial Intelligence from the 2023 Gartner Hype Cycle: https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle