Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Today's paper investigates how format restrictions impact the performance of large language models (LLMs) across various tasks. It examines whether constraining LLMs to produce structured outputs (like JSON or XML) affects their reasoning and knowledge comprehension abilities. The study reveals surprising declines in LLM performance under strict format constraints, especially for reasoning tasks.
Overview
The study uses three main approaches to structured generation, each with progressively relaxed constraints:
They evaluate these methods across various datasets that test different skills, including mathematical reasoning (GSM8K), symbolic manipulation (Last Letter Concatenation), and classification tasks (DDXPlus, MultiFin, etc.). The study uses multiple LLMs, including GPT-3.5-turbo, Claude-3-haiku, and open-source models like LLaMA-3 and Gemma-2.
To account for prompt sensitivity, they test multiple prompt variations for each task and format. They also use an LLM-based "perfect parser" to extract final answers, ensuring fair comparison across different output formats.
Results
The key results are:
领英推荐
Conclusion
The paper demonstrates that format restrictions can significantly impact LLM performance, with effects varying by task type. While structured outputs can benefit downstream processing, overly restrictive schemas may hinder LLMs' reasoning abilities. For more information please consult the?full paper.
Congrats to the authors for their work!
Tam, Zhi Rui, et al. "Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models." arXiv preprint arXiv:2408.02442 (2024).