Agentic AI

Agentic AI

One of the most interesting use cases for LLMs is its use in autonomous agents. LLMs by themselves are great for synthesizing information, writing text, and interpreting logic, however they still fall short in what they can actually do other than output some text-based output. Instead, agentic LLMs, while still only output text, can use this text output to run other functions and get feedback. For example, it might be able to search the web, write and send an email, or even turn off the lights in your room. It is more akin to a really, really smart Siri rather than just a conversational AI.

How it Works

One of the most popular agentic LLMs, and possibly one of the first of its kind, is AutoGPT. Its purpose is to take a task and use the logic and reasoning of an LLM to complete it. For example, if the task was to compile a list of endangered species, the LLM would decide to search google for something like “list of endangered species.” Then, the LLM would “read” the page and synthesize the result for the user.

In fact, you probably have interacted with more "rudimentary" agents like Siri, Google Home, or Amazon Alexa. These don’t necessarily use the huge language models that we have today, but they certainly can perform tasks like these. The issue is when you want to do extremely complex tasks that require a higher level of logic and reasoning. Examples of this could include summarizing large portions of text, recalling previous information, or learning new processes through instructions or web search.?

For LLMs, modern APIs supply something called function calling that allows the user to supply certain functions that require certain parameters to the LLM. The LLM can then choose to call this function if it thinks it needs to complete the task. If you have ever used ChatGPT with web search or image generation enabled, you have actually seen this in action. For example, web search is probably a function that is something like “doWebSearch(query)” and ChatGPT just needs to supply the query, and it will then return it a webpage with information about the query. Similarly, for image generation, ChatGPT is just passing a prompt to Dall-E (an image generation model) to produce the image. This does seem to be changing with GPT-4o having image generation built into the model, but as of right now any image generation is done this way.

Designing our own Agent

Function Calling has quite a lot of pitfalls -- this article aims to produce an agent that avoids as many of those as possible while still leaving an incredibly capable model.

1. The Model

While function calling can technically be done on any model by prompting the model to produce a JSON output with either text content or a function to call, it is far better to do it in a specifically tuned model. There are really only two models that are incredibly good at this right now: GPT4 by OpenAI and FireFunction by Fireworks.ai. FireFunction is an overall worse model that GPT4, but when it comes to accurate function calling it is amazing. Take a look at https://fireworks.ai/blog/firefunction-v1-gpt-4-level-function-calling to see how it stacks up against GPT4.?

2. The Pitfalls

While the model is incredibly good, there are certain problems with it. The first is that it sometimes refuses to call any function and simply answers by itself even if the user specifically tells it not to. For example, here is a simple test in which the model was passed a google search function to use and then asked this prompt.

# note this is a custom LLM / function API. More details later.

@Action.action("search_google",
               "Search Google for the given query.",
               [Param("query", "string", "Query to search Google for")],
               requried_state="webbrowser", change_state="google_search_results")
def search_google(query):
    results = fireweb.search_google(query)
    return f"Here are the results for your query, {query}. Use choose_search_result to click on one: " + ', '.join(results)


self.taskLLM.append_history("user", task)
response, tool_call = self.taskLLM.chat(require_tooling=False)

assert tool_call, “No tool calls returned”        
User: tell me about mcdonald's latest and greatest foods
Output (minus stack trace): AssertionError: No tool calls returned        

The solution to this is actually pretty simple: Just force the model to call a function. Then, if the model decides that it actually doesn’t want to call any function, it would have to call a stop_function_calling function. This means that if it calls stop_function_calling immediately, it wouldn’t have to actually execute any function. But it also makes sure that the model can’t just ignore the existence of the other functions

@Action.action( "stop_function_calling",
                "Stop calling any function. Call this tool once you have completed the task assigned to you or if there is no task to be done.",
                params = [],
                requried_state="any", change_state="none", return_type=None )
def stop_function_calling():
    pass

@Action.action("search_google",
               "Search Google for the given query.",
               [Param("query", "string", "Query to search Google for")],
               requried_state="webbrowser", change_state="google_search_results")
def search_google(query):
    results = fireweb.search_google(query)
    return f"Here are the results for your query, {query}. Use choose_search_result to click on one: " + ', '.join(results)


self.taskLLM.append_history("user", task)
while 1:
     response, tool_call = self.taskLLM.chat(require_tooling=True)
     if tool_call.name == "stop_function_calling":
            break
     self.taskLLM.call_action(tool_call)        

The second problem is that FireFunction’s accuracy deteriorates the more and more functions you add. This means that if you want your model to do a lot of tasks 20+ functions, it may not be able to correctly process all of them. To get around this, I have come up with what I am pretty sure is a novel solution: A function state machine.

3. The State Machine

The way the state machine works is that it restricts the functions passed to the model based on its state. For example, after it calls the search_google function, the state will change to the google_search_results state. From that state, the model is no longer able to search_google since it obviously just did. However, a new function that required the google_search_results is now passed to a model called something like choose_google_search_result. This essentially forces a flow of logic to the model, where it must do search_googlechoose_google_search_result. This prevents things like feedback loops in which the model repeatedly searches things up and also ensures that the model doesn’t randomly go on a tangent.

This graph shows the flow of functions the model could use. Think of this like a pathway for doing a specific task while always allowing the model to be as free as possible. For example, in the webbrowser chain, the model cannot search something on google until it has read the results of the first google search. However, it can continue to choose search results after reading one of them if it wants to look at more than one of the search results. Essentially, it took 11 functions and condensed them in such a way that the model will only ever need to choose from 1-3 at a time.

4. LLM-ception

The final piece of the puzzle is enabling the model to be able to plan out its task before it does this. Studies have shown that this greatly increases the potency of LLMs since it essentially forces the LLM to think through a process before it does it. Modern LLMs, or any language model for that matter, perform next-token prediction based on the context and what has already been generated. What this means is that the LLM can “reason” with itself before taking a particular action so long as it explains what it's going to do before it does it. This is called planning.

In our case, planning is best done via another function. The LLM will start in the “planning” state where it is forced to only be able to call the make_plan function. The make_plan function will then call on another LLM to generate a plan.

We can also take advantage of the fact that the planning LLM doesn’t need to call any functions so we can use a more powerful LLM.?

The code

In order to create an agent like this, I created an API to easily add functions along with their required and change states. It works via function decorators in python. Here is what a search_google function decorator would look like?

@Action.action("search_google",
               "Search Google for the given query.",
               [Param("query", "string", "Query to search Google for")],
               requried_state="webbrowser", change_state="google_search_results")        

These function decorators automatically register the function its applied to as an “action.”\

The other thing I added was an LLM api that wraps both fireworks and openai models into one class. The LLM class also keeps track of conversation history.

taskLLM = LLM(model="accounts/fireworks/models/firefunction-v1",
                       system_prompt="You are a helpful assistant with access to functions. Use them if required.",
                       actions=self.actions,
                       client=FIREWORKSCLIENT,
                       save_history_fp="./task.json" if verbose else None)        

From there, its as simple as defining an execution loop for each task that continuously asks the model if it wants to call a function, and once it doesn’t it stops and waits for the next task.\

def execute_task(self, task):
        # self.taskLLM.reset()
        self.actions.state = "plan"
        self.taskLLM.append_history("user", task)
        while 1:
            response, tool_call = self.taskLLM.chat(require_tooling=True)
            assert tool_call, "No tool calls returned"# even though require_tooling is True."
            if tool_call.name == "stop_function_calling":
                if self.verbose:
                    print("Function calling stopped.")
                break
            self.taskLLM.call_action(tool_call)
        response, tool_call = self.taskLLM.chat(require_tooling=False)
        return response        

The full code can be found on my github. I am still actively updating so I hope to add more features and make it a lot better in the future, but for now you can play around with this agent here: https://github.com/MannanB/FireAssistant

Agentic AI might be as close to AGI that we have right now. It is able transcend the confines of text and allows things akin to self-research, self-learning and introspection that standalone LLMs just aren't capable of yet. While it is very far off from real AGI, it definitely can mimic it. Not to mention, with voice activation (which I did include in the github, it just doesn't work the best), agentic AI makes for great AI assistants that can do whatever actions you want to code for it while also being smarter than any other home assistant. Overall, I think we are definitely going to see an uptick in these kinds of AI algorithms especially from Microsoft and Apple as they incorporate these kinds of things in their own Operating systems.

Puneet Bhardwaj

Group Chief Data Officer @ Zurich Insurance

9 个月

This side of the AI is underappreciated in the industry today. Creating AI agents which could enable a process end to end rather than in a piecemeal fashion is the next frontier.

要查看或添加评论,请登录

Mannan Bhardwaj的更多文章

  • TinyR1: Recreating DeepSeek R1 at Home!

    TinyR1: Recreating DeepSeek R1 at Home!

    OpenAI O1 pushed the frontier of what is possible with LLMs by tuning an LLM to create chains of reasoning using…

    2 条评论
  • Man VS Machine—A Battle Of Intelligence

    Man VS Machine—A Battle Of Intelligence

    Invented by Warren McCulloch and Walter Pitts, the MucColluoch-Pitts Neuron, more popularly known as the Perceptron…

    3 条评论
  • ChatGPT is obsolete

    ChatGPT is obsolete

    Whether its for general use, autonomous agents, or creating fine-tuned chatbots, OpenAI has been at the forefront of…

  • New Open-source LLM: Google Gemma

    New Open-source LLM: Google Gemma

    Intro Google has recently released their newest open-source AI models: Gemma 2b and 7b. These are competitors to other…

    1 条评论
  • Mixture Of Experts: The Future of LLMs

    Mixture Of Experts: The Future of LLMs

    Intro What made GPT3.5 and GPT4 completely destroy all the competition? Since “Open”AI’s closed source models make it…

    4 条评论
  • Learning to Learn

    Learning to Learn

    Intro As we enter the Age of Information, a new resource has arisen: Data. Every website you visited, everything you…

    1 条评论
  • Virtual Cloning

    Virtual Cloning

    Intro Throughout one’s life, a person can create a significant impact on the internet. Every like, every post, every…

    1 条评论
  • A Journey Through Neural Compression

    A Journey Through Neural Compression

    Introduction At this point of the game, I feel like neural networks have been sort of black boxed, and for good reason.…

  • Prompting LLMs with LLMs

    Prompting LLMs with LLMs

    Introduction Prompt engineering is the process of creating a prompt such that the LLM knows exactly what to do and how…

  • Unlocking the true power of LLMs with Vector Embeddings

    Unlocking the true power of LLMs with Vector Embeddings

    What is a Vector Embedding and why is it important? In order to understand the power of vector embeddings, you need to…

    3 条评论

社区洞察

其他会员也浏览了