??Top ML Papers of the Week
Welcome to the Top ML Papers of the Week (May 27 - June 2).
1). Contextual Position Encoding - proposes a new position encoding method, CoPE, to enable the position to be conditioned on context by incrementing position only on certain tokens; the position encoding is context-dependent and can represent different levels of position abstraction; the general position encoding method can attend to the i-th particular word, noun, or sentence; improves perplexity on language modeling and coding tasks. (paper | tweet)
2). Symbolic Chain-of-Thought - proposes a method that improves the logical reasoning capabilities of LLMs by integrating symbolic expressions and logical rules with chain-of-thought (CoT) prompting; the prompting technique is called Symbolic Chain-of-Thought and it’s a fully LLM-based framework with the following key steps: 1) translates natural language context to symbolic format, 2) derives step-by-step plan to solve problems following symbolic logical rules, and 3) uses a verifier to check the translation and reasoning chain. (paper | tweet)
3). Abacus Embeddings - achieves 99% accuracy on 100-digit addition problems by training on only 20-digit numbers with a single GPU; the main challenge this work addresses is the inability of transformers to track the exact position of digits; they do this by adding an embedding to each digit that encodes its position relative to the start of the number; these gains also transfer to multi-step reasoning tasks that include sorting and multiplication. (paper | tweet)
4). Introduction to Vision-Language Modeling - presents an introduction to vision-language models along with key details of how they work and how to effectively train these models. (paper | tweet)
5). GNN-RAG - combines the language understanding abilities of LLMs with the reasoning abilities of GNNs in a RAG style; the GNN extracts useful and relevant graph information while the LLM takes the information and leverages its capabilities to perform question answering over knowledge graphs (KGQA); GNN-RAG improves vanilla LLMs on KGQA and outperforms or matches GPT-4 performance with a 7B tuned LLM. (paper | tweet)
Sponsor message
领英推荐
Prolific is a platform that connects AI researchers with a pool of 150k+ active participants and domain specialists.
Through Prolific, AI researchers collect rich, reliable data that reflects the breadth of humanity, easily and within a matter of hours. Giving them the insights to train models in the race to AGI.
6). Attention as an RNN - presents a new attention mechanism that can be trained in parallel (like Transformers) and be updated efficiently with new tokens requiring constant memory usage for inferences (like RNNs); the attention formulation is based on the parallel prefix scan algorithm which enables efficient computation of attention’s many-to-many RNN output; achieves comparable performance to Transformers on 38 datasets while being more time and memory-efficient. (paper | tweet)
7). Aya23 - a family of multilingual language models that can serve up to 23 languages; it intentionally focuses on fewer languages and allocates more capacity to these languages; shows that it can outperform other massive multimodal models on those specific languages. (paper | tweet)
8). Are Long-LLMs A Necessity For Long-Context Tasks? - claims that long-LLMs are not a necessity to solve long-context tasks; proposes a reasoning framework to enable short-LLMs to address long-context tasks by adaptively accessing and utilizing the context based on the presented tasks; it decomposes the long context into short contexts and processes them using a decision-making process. (paper | tweet)
9). Financial Statement Analysis with LLMs - claims that LLMs can generate useful insights from its analysis of trends and financial ratios; shows that GPT-4 performs on par with narrowly specialized models; and achieves a profitable trading strategy based on GPT’s predictions. (paper | tweet)
10). SimPO - a simpler and more effective approach for preference optimization with a reference-free reward; uses the average log probability of a sequence as an implicit reward (i.e., no reference model required) which makes it more compute and memory efficient; demonstrates that it outperforms existing approaches like DPO and claims to produce the strongest 8B open-source model. (paper | tweet)
Reach out to [email protected] if you would like to promote with us. Our newsletter is read by over 55K AI Researchers, Engineers, and Developers.