Expanding Context Lengths in LLMs; Towards CausalGPT; Perplexity vs. Bard vs. GPT; Meet TinyLama; Leveraging qLoRA For Fine-Tuning; and More.

Expanding Context Lengths in LLMs; Towards CausalGPT; Perplexity vs. Bard vs. GPT; Meet TinyLama; Leveraging qLoRA For Fine-Tuning; and More.


Editor's Paper Recommendations

On the Unexpected Abilities of Large Language Models: Large language models can display a wide range of abilities that are not directly connected with the task for which they are trained: predicting the next words of human-written texts. In this article, I discuss the nature of this indirect acquisition process and its relation to other known indirect processes. An important side effect of such indirect acquisition is the development of integrated abilities. I discuss the extent to which the abilities developed by large language models are predictable. Finally, I briefly discuss the relation between the cognitive skills acquired by these systems and human cognition.

Giraffe: Adventures in Expanding Context Lengths in LLMs: Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths, which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model and introduce some of our designs as well -- in particular, a new truncation strategy for modifying the basis for the position encoding. The paper tests these methods using three new evaluation tasks (FreeFormQA, AlteredNumericQA, and LongChat-Lines) and perplexity, which we find to be less fine-grained as a measure of the long context performance of LLMs. We release the three tasks publicly as datasets on HuggingFace. We discover that linear scaling is the best method for extending context length and show that further gains can be achieved by using longer scales at evaluation time. We also discover promising extrapolation capabilities on a truncated basis. To support further research, we release three new 13B parameter long-context models called Giraffe: 4k and 16k context models trained from base LLaMA-13B and a 32k context model trained from base LLaMA2-13B. We also release the code to replicate our results.

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs: Despite advancements in LLMs, knowledge-based reasoning remains a longstanding issue due to the fragility of knowledge recall and inference. Existing methods primarily encourage LLMs to plan and solve problems autonomously or to extensively sample reasoning chains without addressing the conceptual and inferential fallacies. Attempting to alleviate inferential fallacies and drawing inspiration from multi-agent collaboration, we present a framework to increase faithfulness and causality for knowledge-based reasoning. Specifically, we propose to employ multiple intelligent agents (i.e., reasoner and causal evaluator) to work collaboratively in a reasoning-and-consensus paradigm for elevated reasoning faithfulness. The reasoners focus on providing solutions with human-like causality to solve open-domain problems. On the other hand, the causal evaluator agent scrutinizes if the answer in a solution is causally deducible from the question and vice versa, with a counterfactual answer replacing the original. According to the extensive and comprehensive evaluations on various knowledge reasoning tasks (e.g., science question answering and commonsense reasoning), our framework outperforms all compared state-of-the-art approaches by large margins.

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought: Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pioneering prompting technique, dubbed \textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating challenges: the 24-point game, resolution of high-degree polynomial equations, and derivation of formulas for recursive sequences, our method outperformed GPT-4, achieving accuracy improvements of 89.7%, 86%, and 56% for each respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA) prompting method, \textit{Tree of Thought (ToT)}, our approach registered an average accuracy boost of 23%, 24%, and 15%.

Industry Insights


--

Are you looking to advertise a product, job opening, or event to an audience of over 35,000 AI researchers and engineers? Get in touch with us on?LinkedIn?to explore your options.

Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.

?--

Growth Zone

Digvijay Singh

?I help Businesses Upskill their Employees in Data Science Technology - AI, ML, RPA

1 年

Great insights as always, Danny! The AI Vanguard Newsletter never disappoints. For more valuable content on data science and AI, check out the Data Science Catalog (https://lnkd.in/gi8Sm9cx) and the YouTube channel (https://lnkd.in/gD54ZjUh). #datalovers #AIcommunity

要查看或添加评论,请登录

Danny Butvinik的更多文章

社区洞察

其他会员也浏览了