登录查看更多内容

LLM-based Survey Autonomous Agents; Evaluating LLM on Graphs; Fine-Tune for GPT-3.5 and GPT-4; and More

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

发布日期: 2023年9月4日

Editor's Paper Recommendations

A Survey on Large Language Model-based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human learning processes. It thus makes the agents to achieve human-like decisions. Recently, by acquiring vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating autonomous agents based on LLMs. Researchers have devised diverse agent architectures tailored to different applications to harness the full potential of LLMs. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of autonomous agents from a holistic perspective. More specifically, we focus on constructing LLM-based agents, for which we propose a unified framework encompassing most of the previous work. Additionally, we summarize the various applications of LLM-based AI agents in social science, natural science, and engineering. Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI agents. Based on previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository for the related references at?this https URL.

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis: Large Language Models (LLMs) have garnered considerable interest within both academic and industrial. Yet, applying LLMs to graph data still needs to be explored. In this study, we evaluate the capabilities of four LLMs in addressing several analytical problems with graph data. We employ four distinct evaluation metrics: Comprehension, Correctness, Fidelity, and Rectification. Our results show that: 1) LLMs effectively comprehend graph data in natural language and reason with graph topology. 2) GPT models can generate logical and coherent results, outperforming alternatives in correctness. 3) All examined LLMs need help with structural reasoning, with techniques like zero-shot chain-of-thought and few-shot prompting showing diminished efficacy. 4) GPT models often produce erroneous answers in multi-answer tasks, raising concerns about fidelity. 5) GPT models exhibit elevated confidence in their outputs, potentially hindering their rectification capacities. Notably, GPT-4 has demonstrated the capacity to rectify responses from GPT-3.5-turbo and its previous iterations. The code is available at?this https URL.

Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering: The final frontier for simulation is accurately representing complex, real-world social systems. While agent-based modeling (ABM) seeks to study the behavior and interactions of agents within a larger system, it cannot capture the full complexity of human-driven behavior faithfully. Large language models (LLMs), like ChatGPT, have emerged as a potential solution to this bottleneck by enabling researchers to explore human-driven interactions in previously unimaginable ways. Our research investigates simulations of human interactions using LLMs. Through prompt engineering, inspired by Park et al. (2023), we present two simulations of believable proxies of human behavior: a two-agent negotiation and a six-agent murder mystery game.

Measuring Faithfulness in Chain-of-Thought Reasoning: Large language models (LLMs) exhibit improved performance when utilizing step-by-step "Chain-of-Thought" (CoT) reasoning before providing answers to questions. However, it remains to be seen if the stated CoT reasoning accurately represents the model's actual reasoning process. To explore the fidelity of CoT reasoning, researchers conducted interventions on the CoT (e.g., introducing mistakes or paraphrasing them) and observed how model predictions changed. The study found significant variation among models in how much they relied on CoT when predicting answers across different tasks. Some models heavily relied on CoT, while others largely ignored it. The performance enhancement from CoT does not result solely from increased test-time computation or specific phrasing in the CoT. As language models grow more capable, they tend to produce less faithful reasoning across most studied tasks. However, the research suggests that under certain circumstances, such as carefully choosing the model size and task, CoT can still provide faithful explanations.

Industry Insights

?Using Google Vertex AI to Fine-Tune LLM Apps

This event will blend presentations and hands-on coding demonstrations led by technical and product experts. You will understand how its various features can be utilized to enhance LLM applications.

You’ll learn:?

MIT Technology Review 1 个月前

AI News Roundup

Mohammad Arshad 11 个月前

Can GPTZero be relied upon for AI Detection accuracy?

Anna Y. 6 个月前

Understanding Vertex AI: Get to know the core components of Google Vertex AI and how they can be applied to LLM applications.
Hands-on Coding with Vertex AI: Experience a live coding demo where you'll learn to implement Vertex AI in real-world scenarios.
Optimizing LLM Applications: Learn the techniques to fine-tune LLM applications using Vertex AI's powerful tools.
Insights from Technical Experts: Gain valuable insights from developer practitioners who successfully implement Vertex AI in practical projects.

Registration Link

Growth Zone?

The AI Vanguard

43,662 位关注者

Good AI Vibes

1 年

Great insights on the latest developments in AI, machine learning, deep learning, and analytics! At Good AI Vibes, we delve into practical business applications of AI across various industries in our bi-weekly newsletter. It's a fantastic resource for staying up to date. Join us in this exploration by subscribing here: https://goodaivibes.substack.com/ #artificialintelligence #machinelearning #deeplearning #analytics

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

1 年

This was a good issue.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

LLM-based Survey Autonomous Agents; Evaluating LLM on Graphs; Fine-Tune for GPT-3.5 and GPT-4; and More

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

Editor's Paper Recommendations

Industry Insights

?Using Google Vertex AI to Fine-Tune LLM Apps

领英推荐

Growth Zone?

The AI Vanguard

43,662 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

All About LLMs

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

10 AI Predictions For 2023

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

Customizing and optimizing methods for Large Language Models (LLMs)

Building vs. Utilizing Existing Large Language Models (LLMs): Considerations for Use Cases and Bias Mitigation

Everything You Need to Know About Large Language Models

Testing AI the Human Way: Misguided or Revealing?

GenAI Weekly — Edition 8

Editor's Paper Recommendations

Industry Insights

?Using Google Vertex AI to Fine-Tune LLM Apps

领英推荐

Growth Zone?

The AI Vanguard

43,662 位关注者

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

2024年4月18日

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

2024年4月4日

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

2024年3月12日

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

2024年3月3日

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

2024年2月27日

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

2024年2月20日

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

2024年2月13日

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

2024年2月6日

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

2024年1月30日

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

2024年1月23日

社区洞察

其他会员也浏览了

All About LLMs

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

10 AI Predictions For 2023

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

Customizing and optimizing methods for Large Language Models (LLMs)

Building vs. Utilizing Existing Large Language Models (LLMs): Considerations for Use Cases and Bias Mitigation

Everything You Need to Know About Large Language Models

Testing AI the Human Way: Misguided or Revealing?

GenAI Weekly — Edition 8