LLM-based Survey Autonomous Agents; Evaluating LLM on Graphs; Fine-Tune for GPT-3.5 and GPT-4; and More
Danny Butvinik
Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter
Editor's Paper Recommendations
A Survey on Large Language Model-based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human learning processes. It thus makes the agents to achieve human-like decisions. Recently, by acquiring vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating autonomous agents based on LLMs. Researchers have devised diverse agent architectures tailored to different applications to harness the full potential of LLMs. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of autonomous agents from a holistic perspective. More specifically, we focus on constructing LLM-based agents, for which we propose a unified framework encompassing most of the previous work. Additionally, we summarize the various applications of LLM-based AI agents in social science, natural science, and engineering. Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI agents. Based on previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository for the related references at?this https URL.
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis: Large Language Models (LLMs) have garnered considerable interest within both academic and industrial. Yet, applying LLMs to graph data still needs to be explored. In this study, we evaluate the capabilities of four LLMs in addressing several analytical problems with graph data. We employ four distinct evaluation metrics: Comprehension, Correctness, Fidelity, and Rectification. Our results show that: 1) LLMs effectively comprehend graph data in natural language and reason with graph topology. 2) GPT models can generate logical and coherent results, outperforming alternatives in correctness. 3) All examined LLMs need help with structural reasoning, with techniques like zero-shot chain-of-thought and few-shot prompting showing diminished efficacy. 4) GPT models often produce erroneous answers in multi-answer tasks, raising concerns about fidelity. 5) GPT models exhibit elevated confidence in their outputs, potentially hindering their rectification capacities. Notably, GPT-4 has demonstrated the capacity to rectify responses from GPT-3.5-turbo and its previous iterations. The code is available at?this https URL.
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering: The final frontier for simulation is accurately representing complex, real-world social systems. While agent-based modeling (ABM) seeks to study the behavior and interactions of agents within a larger system, it cannot capture the full complexity of human-driven behavior faithfully. Large language models (LLMs), like ChatGPT, have emerged as a potential solution to this bottleneck by enabling researchers to explore human-driven interactions in previously unimaginable ways. Our research investigates simulations of human interactions using LLMs. Through prompt engineering, inspired by Park et al. (2023), we present two simulations of believable proxies of human behavior: a two-agent negotiation and a six-agent murder mystery game.
Measuring Faithfulness in Chain-of-Thought Reasoning: Large language models (LLMs) exhibit improved performance when utilizing step-by-step "Chain-of-Thought" (CoT) reasoning before providing answers to questions. However, it remains to be seen if the stated CoT reasoning accurately represents the model's actual reasoning process. To explore the fidelity of CoT reasoning, researchers conducted interventions on the CoT (e.g., introducing mistakes or paraphrasing them) and observed how model predictions changed. The study found significant variation among models in how much they relied on CoT when predicting answers across different tasks. Some models heavily relied on CoT, while others largely ignored it. The performance enhancement from CoT does not result solely from increased test-time computation or specific phrasing in the CoT. As language models grow more capable, they tend to produce less faithful reasoning across most studied tasks. However, the research suggests that under certain circumstances, such as carefully choosing the model size and task, CoT can still provide faithful explanations.
Industry Insights
?Using Google Vertex AI to Fine-Tune LLM Apps
This event will blend presentations and hands-on coding demonstrations led by technical and product experts. You will understand how its various features can be utilized to enhance LLM applications.
You’ll learn:?
领英推荐
Growth Zone?
Great insights on the latest developments in AI, machine learning, deep learning, and analytics! At Good AI Vibes, we delve into practical business applications of AI across various industries in our bi-weekly newsletter. It's a fantastic resource for staying up to date. Join us in this exploration by subscribing here: https://goodaivibes.substack.com/ #artificialintelligence #machinelearning #deeplearning #analytics
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
1 年This was a good issue.