AI This Week: Revolutionizing Language Models and More!
Top News
The paper “Scalable MatMul-free Language Modeling” has taken Twitter by storm, generating 2.3 million impressions due to its innovative approach that eliminates matrix multiplication (MatMul) from Large Language Models (LLMs). Traditionally, MatMul is essential for processing dense layers and implementing self-attention mechanisms in neural networks, but it demands substantial computational power and memory.
LLMs typically require MatMul for their operations, which significantly limits their deployment to environments equipped with high-end hardware due to the high computational and memory demands. The research introduces a method that replaces MatMul with simpler computational techniques, dramatically reducing resource consumption while maintaining model performance.
In dense layers, the method substitutes MatMul with ternary accumulations where the weights are only -1, 0, or +1. This reduces the complexity of calculations. For self-attention mechanisms, it utilizes a MatMul-free Linear Gated Recurrent Unit (MLGRU) that operates solely on element-wise products. In channel mixing, it employs modified Gated Linear Units (GLUs) that integrate BitLinear layers with ternary weights, efficiently managing data integration across channels with reduced computational overhead.
Removing MatMul from the calculations in large language models means these models don’t need powerful computers to run. This change allows them to work on simpler devices, like smaller servers or even some personal computers, making advanced AI tools available to more people and places.
Memory usage during inference sees a reduction by more than 10 times compared to unoptimized models. Training speed increases by 25.6%, and overall memory requirements drop by 61% relative to conventional approaches. Custom FPGA accelerators demonstrate the practicality of this method by processing billion-parameter models with just 13 watts of power.
Trending Signals
Top Papers
Diffusion models for image generation often struggle with maintaining image diversity and quality, especially in lower-probability regions of the data distribution. Existing methods like classifier-free guidance (CFG) increase prompt alignment and image quality but reduce variation.
领英推荐
The paper introduces autoguidance, a method where a diffusion model is guided by a less trained or smaller version of itself. This approach aims to improve control over image quality without compromising image diversity, unlike traditional CFG.
Autoguidance achieved state-of-the-art results on ImageNet-512 with a Fréchet Inception Distance (FID) of 1.25. It also set new benchmarks in ImageNet-64 with an FID of 1.01, significantly enhancing image quality while preserving diversity.
Transformers, while effective in computer vision, suffer from high computational costs due to quadratic complexity, especially with high-resolution images.
Vision-LSTM (ViL) adapts the xLSTM architecture for vision tasks, using a sequence of alternating bi-directional mLSTM blocks to process image patch tokens efficiently with linear computational complexity.
ViL outperforms standard vision transformers on ImageNet-1K classification. ViL-T achieves 77.3% accuracy, outdoing DeiT-T at 72.2%. Even in heavily optimized transformer setups, ViL demonstrates competitive performance, with ViL-B reaching 81.6% accuracy versus DeiT-B’s 81.8%.
AI models are vulnerable to adversarial attacks, which compromise model outputs, posing a significant reliability and safety issue. Current defenses like adversarial training fail to generalize against novel attacks and often degrade model performance.
The paper introduces “Short Circuiting,” a technique that manipulates internal model representations to prevent harmful outputs without specific attack training. This method, based on representation engineering, disrupts harmful processes by rerouting them towards safe states, effectively making the model attack-agnostic.
Short Circuiting demonstrated a significant reduction in compliance to harmful requests by up to 90% on Llama-3-8B-Instruct models, with minimal performance impact (less than 1% decrease in capability tests). The technique outperforms traditional refusal and adversarial training, maintaining robustness against a wide range of unseen adversarial attacks.
Subscribe to Newsletter : https://lnkd.in/guxfrUSM