登录查看更多内容

LLMs and False Promise of Creativity; LLMs as Optimizers; Running Thousands of LLMs on One GPU; 10 GPTs You Should Know; and More

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

发布日期: 2023年11月21日

Editor's Paper Recommendations

Art or Artifice? Large Language Models and the False Promise of Creativity: Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities, from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative writers and implement a human assessment of 48 stories written by professional authors or LLMs using TTCW. Our analysis shows that LLM-generated stories pass 3-10 fewer TTCW tests than professional stories. In addition, we explore using LLMs as assessors to automate the TTCW evaluation, revealing that none of the LLMs positively correlate with the expert assessments.

Large Language Models as Optimizers: Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient challenges many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values. The new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization, where the goal is to find instructions that maximize task accuracy. With various LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks.

A Practical Survey on Zero-shot Prompt Design for In-context Learning: The remarkable advancements in large language models (LLMs) have significantly improved Natural Language Processing(NLP) tasks. This paper comprehensively reviews in-context learning techniques, focusing on different types of prompts, including discrete, continuous, few-shot, and zero-shot, and their impact on LLM performance. We explore various approaches to prompt design, such as manual design, optimization algorithms, and evaluation methods, to optimize LLM performance across diverse tasks. Our review covers key research studies in prompt engineering, discussing their methodologies and contributions to the field. We also delve into the challenges faced in evaluating prompt performance, given the absence of a single "best" prompt and the importance of considering multiple metrics. In conclusion, the paper highlights the critical role of prompt design in harnessing the full potential of LLMs. It provides insights into the combination of manual design, optimization techniques, and rigorous evaluation for more effective and efficient use of LLMs in various NLP tasks.

Are you looking to advertise a product, job opening, or event to an audience of over 40,000 AI researchers and engineers? Get in touch with us on?LinkedIn?to explore your options.

Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.

领英推荐

Deploying LLM Applications

Ram Narasimhan 8 个月前

Unlocking the Full Potential of Large Language Models:…

Sanjay Kumar MBA,MS,PhD 1 年前

LLM vs. LQM

Sanjiv Goyal 2 个月前

Industry Insights

?Growth Zone

?3 Rhetorical Techniques to Increase Your Impact

Expert Advice

The AI Vanguard

43,664 位关注者

Biswajyoti Kar

Principal Data Scientist | AI | Data Science | Digital | Automation

1 年

Glad to know about the LLM's optimizers

justin x

"Data scientist & Analyst | AI enthusiast - leveraging the power of Generative AI Tools like ChatGPT,Bard to drive business success" Love to explore more on #AI tools #Data Analysis #Data Science #music(smule).

1 年

Very Glad to learn the latest advancements in LLM's Optimizers, GPT's . Thank you for sharing a much needed Newsletter in Current AI Era Danny Butvinik ??

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

LLMs and False Promise of Creativity; LLMs as Optimizers; Running Thousands of LLMs on One GPU; 10 GPTs You Should Know; and More

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

Editor's Paper Recommendations

领英推荐

Industry Insights

?Growth Zone

Expert Advice

The AI Vanguard

43,664 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

How Large Language Models (LLMs) Work: A Deep Dive into ChatGPT

Exploring Large Language Models: Unpacking the Evolution, Impact, and Future of AI's Linguistic Powerhouse

Differences Between RAG and Fine Tuning

Retrieval Augmented Generation (RAG)

Phi-2: A Small Language Model That Packs a Big Punch

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants

GPT-3 and the rise of foundation models

Editor's Paper Recommendations

领英推荐

Industry Insights

?Growth Zone

Expert Advice

The AI Vanguard

43,664 位关注者

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

2024年4月18日

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

2024年4月4日

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

2024年3月12日

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

2024年3月3日

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

2024年2月27日

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

2024年2月20日

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

2024年2月13日

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

2024年2月6日

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

2024年1月30日

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

2024年1月23日

社区洞察

其他会员也浏览了

How Large Language Models (LLMs) Work: A Deep Dive into ChatGPT

Exploring Large Language Models: Unpacking the Evolution, Impact, and Future of AI's Linguistic Powerhouse

Differences Between RAG and Fine Tuning

Retrieval Augmented Generation (RAG)

Phi-2: A Small Language Model That Packs a Big Punch

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants

GPT-3 and the rise of foundation models