How to Optimize LLM Performance with AI Agents

How to Optimize LLM Performance with AI Agents

Apple popularized the concept of AI agents in 1994. After thirty years, they will soon be integrated into desktops and mobile devices. During this time, AI developed from a research discipline to programs for natural language processing (NLP) and generative AI applications to autonomous agents. Agents can automate actions and improve the output of large language models (LLM). Open-source projects including Langchain, LlamaIndex, and Eidolon AI provide frameworks for agents capable of performing many tasks, including retrieval augmented generation (RAG). RAG offers helpful context for domain-specific questions, which can enhance employee experiences or deliver customer support and specialized knowledge.

Agents on the global stage (and in the palms of mobile users)

On June 10, Apple made headlines during the Worldwide Developer Conference by announcing AI-native capabilities for iOS 18, iPadOS 18, and MacOS Sequoia to be launched in September. The company broke its long-standing silence on AI development plans with the debut of Apple Intelligence. CEO Tim Cook and team demoed new features, including:

  • Custom image & emoji generation
  • Integrations with OpenAI’s GPT4o
  • Apple Silicon for on-device processing
  • Privacy Cloud Compute for AI processing
  • Enhanced NLP and text commands for Siri

Apple revealed the soon-to-be multimodal virtual assistant, which executes actions within and across many applications. Device holders can expect to search for photos, videos, or files with natural language, improving user productivity and native app experiences. Furthermore, developers will soon be able to integrate voice and text interactions into third-party applications.?

Siri’s new functionality marks a shift from conversational AI to AI agents, which are defined as software programs that perceive their environment and act autonomously to complete a task. Apple placed agents on the world stage (and in the palms of mobile users, except for those in Europe) but it’s hardly a new concept.

Thirty years ago, three Distinguished Research Scientists at Apple described agents as "a persistent software entity dedicated to a specific purpose.” Allen Cypher, David Canfield Smith, and Jim Spohrer emphasized that “‘persistent’ distinguishes agents from subroutines; agents have their own ideas about how to accomplish tasks, their own agendas.” They further elaborated that “‘special purpose' distinguishes them from entire multifunction applications.”

SVP of Software Engineering Craig Federighi presenting Apple Intelligence

Improving the output of large language models

One may wonder how agents relate to LLMs given the parallel release of Siri’s actions and ChatGPT integrations. Whereas agents complete tasks, generative AI applications built on LLMs predict the next token in a string.

It’s no secret that Siri struggles with natural language understanding, just as LLMs hallucinate when responding. A 2019 report by Statista ranks the answer accuracy of the voice assistant at 83.1%, while foundation models by OpenAI, Microsoft, Google, Meta, and Anthropic maintained a greater rate of factual consistency, according to the public LLM leaderboard published by Palo Alto startup Vectara. Coincidentally, Apple’s OpenELM-3B-Instruct ranked as the least precise model evaluated.?

LLM leaderboard by Vectara

There are two primary methods to improve the performance of LLMs: fine-tuning and RAG. These methods are not mutually exclusive, and their utilization depends on the given use case or constraints.

Fine-tuning involves training a foundation model with domain-specific datasets or modifying parameters to influence its behavior. Although configured to produce more specific output, fine-tuned models have been known to generate unexpected answers. Furthermore, they require additional resources for data labeling, model training, and adjustments.?

RAG employs agents to query knowledge bases so pre-trained models can access recent, reliable, and relevant information. Models grounded by a source of truth and supplemented with contextual understanding exhibit lower risks of hallucinations. Developers seemingly prefer RAG for scenarios that demand subject matter expertise. It also tends to be less complex, costly, and resource-intensive when compared to fine-tuning.?

Many open-source projects enable developers to build agents capable of performing RAG – including Langchain, LlamaIndex, and more. However, Eidolon AI by August Data stands out due to its ease of use, modularity, and multi-agent architecture. Not only that, but the organization demonstrates the value of its solution by publishing the codebase for agents that search and retrieve information about its very own GitHub repository.

Retrieval augmented generation with AI agents?

Below are seven steps to implement agents that conduct RAG for Eidolon AI GitHub repositories. Before starting, note that this open-source project only supports MacOS and Linux. Those using Windows must install Windows Subsystem for Linux (WSL).

  1. Set up the developer environment by downloading Python 3.11 or 3.12 and Python Poetry, then acquire a premium OpenAI license with access to an API key
  2. Copy and paste this code snippet into a CLI to clone the Eidolon quickstart repository and download all necessary dependencies locally:?

git clone https://github.com/eidolon-ai/eidolon-quickstart.git
cd eidolon-quickstart        

3. Run the Eidolon HTTP server in developer mode by entering the following command:?

make serve-dev        

4. At the program’s request, input the OpenAI API key, accessible from an OpenAI account.?

5. Fork the Eidolon chatbot repository, clone it locally, and start the server with this script:

git clone https://github.com/eidolon-ai/eidolon-git-search.git
cd eidolon-git-search
make serve-dev        

6. Add the GitHub token, which can be found in a GitHub account, to avoid any rate limit errors.?

7. Navigate to the chatbot UI in a web browser. Select the agent, open a chat, and enter a prompt.?

Eidolon AI chatbot UI

Completing these seven steps will facilitate the deployment of two agents: the repo expert agent and the repo search agent. The repo expert agent can be considered a user-facing copilot that receives and responds to questions about the Eidolon GitHub repository. It retrieves answers from the repo search agent, which converts queries in natural language to vector searches and returns the top result.

Now that the agents are programmed to answer questions, feel free to ask about Eidolon AI or how to customize its agents for specific use cases. Troubleshoot potential errors by referencing the quickstart guide or the “recipe” for the GitHub repo expert and search agents. Join this Discord channel to contribute to the project or send inquiries to the developers.?

Incorporating search agents into user workflows

Not only do the instructions exemplify how to implement agents for RAG, but the agents grounded by the Eidolon GitHub repository exhibit the potential of the many applications that can be built with the open-source framework. Developers agree that RAG provides the most value when sharing specialized knowledge, enhancing employee experiences, and delivering customer support.?

Employee experience agents are critical for operational efficiency because they unlock productivity gains for staff. Glean and Moveworks, for example, have developed products to help end clients search for information in their corporate intranet and learn more about the inner workings of their organization. Customer support agents are crucial for commercial growth, as they are responsible for fielding and addressing questions about technical documentation for software products. Vendors such as Intercom and Aisera offer solutions to help software companies retain users by resolving their confusion.?

Companies like Notion are launching AI productivity features leveraging RAG to answer questions from tens of millions of users instantaneously by searching billions of documents, which reduced operating costs by 60%. Health and legal tech providers like DISCO and InpharmD, respectively, have also leveraged RAG to benefit from productivity and cost savings while optimizing response times and response accuracy. It will be interesting to witness how enterprises, law firms, or healthcare centers of the future adopt RAG for various use cases.

Thanks Anthony Walsh. Useful for practitioners who want to move beyond Pilots to real world production deployments.

Ravi Ramachandran

Startup-tarian | CEO & Co-Founder | Data & AI Go-To-Market and Sales Leader

8 个月

Thanks for going deep with Eidolon Anthony Walsh. We're working hard to offer agents that are enterprise grade and production ready. Your feedback has been super useful and definitely helped us improve.

要查看或添加评论,请登录

Anthony Walsh的更多文章

社区洞察

其他会员也浏览了