登录查看更多内容

AI This Week: Revolutionizing Language Models and More!

Jerome Fernandes

?? Experimenting with AI in Digital Marketing

发布日期: 2024年6月11日

+ 关注

Top News

Architecture: Eliminating Matrix Multiplication (MatMul) from LLMs

The paper “Scalable MatMul-free Language Modeling” has taken Twitter by storm, generating 2.3 million impressions due to its innovative approach that eliminates matrix multiplication (MatMul) from Large Language Models (LLMs). Traditionally, MatMul is essential for processing dense layers and implementing self-attention mechanisms in neural networks, but it demands substantial computational power and memory.

LLMs typically require MatMul for their operations, which significantly limits their deployment to environments equipped with high-end hardware due to the high computational and memory demands. The research introduces a method that replaces MatMul with simpler computational techniques, dramatically reducing resource consumption while maintaining model performance.

In dense layers, the method substitutes MatMul with ternary accumulations where the weights are only -1, 0, or +1. This reduces the complexity of calculations. For self-attention mechanisms, it utilizes a MatMul-free Linear Gated Recurrent Unit (MLGRU) that operates solely on element-wise products. In channel mixing, it employs modified Gated Linear Units (GLUs) that integrate BitLinear layers with ternary weights, efficiently managing data integration across channels with reduced computational overhead.

Removing MatMul from the calculations in large language models means these models don’t need powerful computers to run. This change allows them to work on simpler devices, like smaller servers or even some personal computers, making advanced AI tools available to more people and places.

Memory usage during inference sees a reduction by more than 10 times compared to unoptimized models. Training speed increases by 25.6%, and overall memory requirements drop by 61% relative to conventional approaches. Custom FPGA accelerators demonstrate the practicality of this method by processing billion-parameter models with just 13 watts of power.

Trending Signals

Top Papers

Image Generation: Guiding a Diffusion Model with a Bad Version of Itself

Diffusion models for image generation often struggle with maintaining image diversity and quality, especially in lower-probability regions of the data distribution. Existing methods like classifier-free guidance (CFG) increase prompt alignment and image quality but reduce variation.

领英推荐

Demystifying Large Language Models

Brij kishore Pandey 3 个月前

The Limits of Retrieval Augmentation, 8 AI Research…

Open Data Science Conference (ODSC) 8 个月前

Unlocking LLM Potential with Memory Compression: ARM…

Ganesh Raju 2 个月前

The paper introduces autoguidance, a method where a diffusion model is guided by a less trained or smaller version of itself. This approach aims to improve control over image quality without compromising image diversity, unlike traditional CFG.

Autoguidance achieved state-of-the-art results on ImageNet-512 with a Fréchet Inception Distance (FID) of 1.25. It also set new benchmarks in ImageNet-64 with an FID of 1.01, significantly enhancing image quality while preserving diversity.

Vision: Vision-LSTM: xLSTM as Generic Vision Backbone

Transformers, while effective in computer vision, suffer from high computational costs due to quadratic complexity, especially with high-resolution images.

Vision-LSTM (ViL) adapts the xLSTM architecture for vision tasks, using a sequence of alternating bi-directional mLSTM blocks to process image patch tokens efficiently with linear computational complexity.

ViL outperforms standard vision transformers on ImageNet-1K classification. ViL-T achieves 77.3% accuracy, outdoing DeiT-T at 72.2%. Even in heavily optimized transformer setups, ViL demonstrates competitive performance, with ViL-B reaching 81.6% accuracy versus DeiT-B’s 81.8%.

Alignment: Improving Alignment and Robustness with Short Circuiting

AI models are vulnerable to adversarial attacks, which compromise model outputs, posing a significant reliability and safety issue. Current defenses like adversarial training fail to generalize against novel attacks and often degrade model performance.

The paper introduces “Short Circuiting,” a technique that manipulates internal model representations to prevent harmful outputs without specific attack training. This method, based on representation engineering, disrupts harmful processes by rerouting them towards safe states, effectively making the model attack-agnostic.

Short Circuiting demonstrated a significant reduction in compliance to harmful requests by up to 90% on Llama-3-8B-Instruct models, with minimal performance impact (less than 1% decrease in capability tests). The technique outperforms traditional refusal and adversarial training, maintaining robustness against a wide range of unseen adversarial attacks.

Subscribe to Newsletter : https://lnkd.in/guxfrUSM

要查看或添加评论，请登录

查看全部

AI This Week: Revolutionizing Language Models and More!

Jerome Fernandes

?? Experimenting with AI in Digital Marketing

Top News

Trending Signals

Top Papers

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

?? Is Google Back in the AI Race?

??Top ML Papers of the Week

The Evolution of AI, including the Integration of Quantum Computing

ChatGPT and CFD

LLM Hallucination: an Optimization Problem or an Architecture Problem?

ML Papers of The Week (Jan 1-8)

The Noun-Phrase Dominance Model: A Proposed Solution to LLM Hallucinations

Navigating the AI Frontier: An In-Depth Glossary of Cutting-Edge Concepts in Large Language Models

Understanding Embeddings

Roadmap of AI: Navigating the Future of Artificial Intelligence.

Top News

Trending Signals

Top Papers

领英推荐

AI This Week: Open Source Triumphs, New Models, and Cutting-Edge Innovations

2024年6月19日

AI This Week: Unleashing the Power of AI - From Text-to-Video to Mobile Optimization

2024年6月13日

AI This Week: Siri’s New ChatGPT Powers and a GPT in a Spreadsheet!

2024年6月10日

Alibaba’s Qwen2 Shakes Up the AI World, Outperforming Meta’s Llama3

2024年6月9日

AI This Week: Unleashing the Power of Fine-Tuning and Object Detection!

2024年6月7日

AI This Week: Unveiling Mamba-2, The Game-Changer in State Space Model Architecture

2024年6月5日

AI This Week: Claude’s New Feature Revolutionizes AI Interactions

2024年6月1日

AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!

2024年5月30日

AI This Week: From Personal Assistants to Groundbreaking Papers

2024年5月28日

AI This Week: Unleashing the Power of AI - From Autonomous Assistants to Multilingual Models

2024年5月26日

社区洞察

其他会员也浏览了

?? Is Google Back in the AI Race?

??Top ML Papers of the Week

The Evolution of AI, including the Integration of Quantum Computing

ChatGPT and CFD

LLM Hallucination: an Optimization Problem or an Architecture Problem?

ML Papers of The Week (Jan 1-8)

The Noun-Phrase Dominance Model: A Proposed Solution to LLM Hallucinations

Navigating the AI Frontier: An In-Depth Glossary of Cutting-Edge Concepts in Large Language Models

Understanding Embeddings

Roadmap of AI: Navigating the Future of Artificial Intelligence.