登录查看更多内容

Top AI/ML Papers of the Week [15/07 - 21/07]

Bruno Lopes e Silva

Artificial Intelligence | National Award-Winning Engineer ???? | Professor | Speaker | PhD Candidate in AI | Podcast Host ???

发布日期: 2024年7月26日

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased with a short synopsis and a link to investigate the subject further. At the end, a reflection on how these advances may impact your projects or companies in the future will be presented!

[1] AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

LLM agents excel in various applications due to their advanced reasoning and ability to interact with external knowledge and tools. However, their reliance on unverified knowledge bases raises safety and trustworthiness concerns. This study introduces AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. The trigger generation process uses constrained optimization to ensure a high probability retrieval of malicious demonstrations when a user instruction contains the backdoor trigger, while benign instructions remain unaffected. AgentPoison requires no additional model training, demonstrating high transferability, coherence, and stealthiness. Experiments show an average attack success rate of over 80% on three real-world LLM agents with minimal impact on benign performance. [Link ]

[2] Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

This paper introduces a weight-aware deep reinforcement learning (WADRL) approach for addressing the multiobjective vehicle routing problem with time windows (MOVRPTW). The approach uses a single deep reinforcement learning (DRL) model to solve the entire optimization problem. It combines a transformer-based policy network with the NSGA-II algorithm to enhance solution quality. The proposed model balances minimizing travel costs and maximizing customer satisfaction. Extensive experiments show that WADRL outperforms traditional methods, significantly reducing the time required to generate initial solutions for NSGA-II while improving scalability and overall solution quality. The weight-aware strategy also reduces DRL training time and enhances results. [Link ]

[3] Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Research on scaling LLMs has mainly focused on model parameters and training data size, often neglecting vocabulary size. This study explores the impact of vocabulary size on LLM scaling laws by training models from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. Three approaches—IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function—suggest that the optimal vocabulary size depends on the compute budget and that larger models benefit from larger vocabularies. For instance, the optimal vocabulary size for Llama2-70B should be at least 216K, much larger than its current 32K. Empirical validation with 3B parameter models shows that increasing vocabulary size improves performance. Increasing the vocabulary from 32K to 43K raised ARC-Challenge scores from 29.1 to 32.0 using the same FLOPs budget. This work highlights the need to consider both model parameters and vocabulary size for efficient scaling. [Link ]

[4] GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

Introducing GoldFinch, a hybrid Linear Attention/Transformer model that efficiently generates a highly compressed and reusable KV-Cache in linear time and space relative to sequence length. GoldFinch combines the new GOLD transformer with an enhanced version of the Finch (RWKV-6) architecture. Models up to 1.5B parameters were trained, showing significantly improved performance over Finch and Llama. The cache size savings scale with model layers, up to 2550 times smaller than traditional transformer caches, allowing for inference of large context lengths on limited hardware. The pre-fill computation of the initial cache state costs O(1) time per token due to the use of a recurrent neural network (RNN). [Link ]

[5] Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Post-training quantization is key for addressing memory bottlenecks in LLM inference but struggles below 4-bit precision. Training models directly at low bitwidth, like binary or ternary models, is an alternative, though their performance and dynamics are less understood. The Spectra LLM suite, with 54 models ranging from 99M to 3.9B parameters trained on 300B tokens, addresses this. Spectra includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, 8 bits), and ternary LLMs (TriLMs). TriLMs outperform previous ternary models, matching half-precision models at scale. TriLM 3.9B matches the performance of FloatLM 3.9B in reasoning benchmarks but retains similar levels of toxicity and stereotyping. TriLM excels on cleaner datasets like Lambada and PennTreeBank but lags in perplexity on noisier validation splits. [Link ]

[6] Qwen2 Technical Report

The Qwen2 series, the latest in large language and multimodal models, includes foundational and instruction-tuned models ranging from 0.5 to 72 billion parameters, featuring dense and Mixture-of-Experts models. Qwen2 surpasses previous models, including Qwen1.5, and competes with proprietary models in language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, achieves high scores on benchmarks like MMLU (84.2), GPQA (37.9), HumanEval (64.6), GSM8K (89.5), and BBH (82.4). The instruction-tuned Qwen2-72B-Instruct scores 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Qwen2 also excels in multilingual capabilities across 30 languages. Qwen2 model weights and supplementary materials are available on Hugging Face, ModelScope, and GitHub, supporting community innovation and application development. [Link ]

[7] Human-like Episodic Memory for Infinite Context LLMs

LLMs excel in many areas but struggle with extensive contexts, limiting their coherence over long sequences. EM-LLM, a new approach inspired by human episodic memory and event cognition, addresses this by organizing token sequences into coherent episodic events using Bayesian surprise and graph-theoretic boundary refinement. It retrieves these events through a two-stage process, enhancing efficiency and relevance. EM-LLM outperforms the InfLLM model on the LongBench dataset, with a 4.3% improvement overall and a 33% boost in the PassageRetrieval task. This method aligns well with human event perception and bridges AI with human memory mechanisms, fostering interdisciplinary research. [Link ]

[8] SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Spreadsheets, with their complex grids and diverse formats, challenge LLMs. SpreadsheetLLM introduces an efficient encoding method to enhance LLMs' understanding and reasoning on spreadsheets. Initially using a vanilla serialization approach, limitations arose due to token constraints. To address this, SheetCompressor was developed, featuring structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It improved performance in spreadsheet table detection by 25.6% in GPT-4's setting. Fine-tuned with SheetCompressor, LLMs achieved a 25x compression ratio and a state-of-the-art 78.9% F1 score, outperforming existing models by 12.3%. SpreadsheetLLM effectively leverages spreadsheet layout and structure across various tasks. [Link ]

How might these advances impact the future?

AgentPoison, a novel backdoor attack on LLM agents, highlights vulnerabilities in memory and RAG-based systems. It stresses the need for robust defenses against such threats to ensure safe and trustworthy AI applications.

Weight-aware deep reinforcement learning (WADRL) enhances multiobjective vehicle routing by integrating a transformer-based policy network with NSGA-II optimization. This approach balances travel costs and customer satisfaction, improving scalability and solution quality while reducing training time.

Vocabulary size optimization in LLMs shows that larger vocabularies improve performance, suggesting a shift in scaling strategies to include both model parameters and vocabulary size. This finding could lead to more efficient and capable LLMs.

GoldFinch introduces a hybrid Linear Attention/Transformer model that generates a compressed KV-Cache, significantly reducing memory requirements. This innovation enables large context inference on limited hardware, broadening the accessibility of advanced LLMs.

The Spectra LLM suite offers insights into low-bitwidth model training, particularly ternary models, which match half-precision models in performance. This approach addresses memory bottlenecks without significant performance loss, enhancing LLM deployment in resource-constrained environments.

Qwen2 series, with models up to 72 billion parameters, excels in language and multimodal tasks, rivaling proprietary models. Its open availability on platforms like Hugging Face promotes community-driven innovation and application development.

EM-LLM integrates human-like episodic memory into LLMs, improving coherence over long sequences. This approach bridges AI and cognitive science, enhancing LLM capabilities in extended context processing and fostering interdisciplinary research.

SpreadsheetLLM optimizes LLM performance on spreadsheet tasks using SheetCompressor, improving table detection and overall task performance. This method leverages spreadsheet structures, making LLMs more effective in handling complex data formats.

In conclusion, these advancements set the stage for:

Enhanced security measures for LLM agents;
Improved optimization strategies in vehicle routing and LLM scaling;
More efficient memory use in LLMs through advanced caching and low-bitwidth models;
Open-source models promoting collaborative development;
Integration of human-like memory mechanisms in LLMs;
Better handling of complex data formats like spreadsheets;
Superior multilingual and multimodal capabilities in LLMs;
Advanced mathematical reasoning and episodic memory processing in LLMs.

By leveraging these innovations, the scientific community and various industries can unlock new levels of creativity, efficiency, and engagement in AI-driven solutions, significantly impacting how we interact with technology and each other in the digital age.

If you found value in these insights and reflections, please don't forget to share and interact. Your participation helps spread the information and contributes to a more engaged and informed community.??

AI Review: Top Weekly Papers

1,772 位关注者

要查看或添加评论，请登录

Bruno Lopes e Silva的更多文章

Top AI/ML Papers of the Week [18/11 - 24/11]

2024年11月20日

Top AI/ML Papers of the Week [18/11 - 24/11]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [23/09 - 29/09]

2024年10月3日

Top AI/ML Papers of the Week [23/09 - 29/09]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…

1 条评论
Top AI/ML Papers of the Week [26/08 - 01/09]

2024年9月4日

Top AI/ML Papers of the Week [26/08 - 01/09]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [19/08 - 25/08]

2024年8月27日

Top AI/ML Papers of the Week [19/08 - 25/08]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [12/08 - 18/08]

2024年8月21日

Top AI/ML Papers of the Week [12/08 - 18/08]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [05/08 - 11/08]

2024年8月16日

Top AI/ML Papers of the Week [05/08 - 11/08]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…

1 条评论
Top AI/ML Papers of the Week [29/07 - 04/08]

2024年8月6日

Top AI/ML Papers of the Week [29/07 - 04/08]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…

2 条评论
Top AI/ML Papers of the Week [22/07 - 28/07]

2024年8月1日

Top AI/ML Papers of the Week [22/07 - 28/07]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [08/07 - 14/07]

2024年7月18日

Top AI/ML Papers of the Week [08/07 - 14/07]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…
Top AI/ML Papers of the Week [01/07 - 07/07]

2024年7月10日

Top AI/ML Papers of the Week [01/07 - 07/07]

Last week, I picked out eight scientific articles that I found noteworthy to share with you. Each will be showcased…

See all articles

AI Review: Top Weekly Papers

1,772 位关注者

Bruno Lopes e Silva的更多文章

Top AI/ML Papers of the Week [18/11 - 24/11]

Top AI/ML Papers of the Week [23/09 - 29/09]

Top AI/ML Papers of the Week [26/08 - 01/09]

Top AI/ML Papers of the Week [19/08 - 25/08]

Top AI/ML Papers of the Week [12/08 - 18/08]

Top AI/ML Papers of the Week [05/08 - 11/08]

Top AI/ML Papers of the Week [29/07 - 04/08]

Top AI/ML Papers of the Week [22/07 - 28/07]

Top AI/ML Papers of the Week [08/07 - 14/07]

Top AI/ML Papers of the Week [01/07 - 07/07]