登录查看更多内容

Open AI's Extracting Concepts from GPT-4 & Meta AI's No Language Left Behind- Scaling Human-Centered Machine Translation

Aditi Khare

AWS & AI Research [LLMs & Vision]-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | Inference Optimization | Hyperspectral Imaging | Open-Source Dev | Build Production-Grade AI Products from Scratch

发布日期: 2024年6月9日

+ 关注

Open AI's - Extracting Concepts from GPT-4

Using Scalable Methods to decompose GPT-4’s internal representations into 16 million oft-interpretable patterns.

The challenge of interpreting neural networks

To understand and interpret neural networks, we first need to find useful building blocks for neural computations. Unfortunately, the neural activations inside a language model activate with unpredictable patterns, seemingly representing many concepts simultaneously. They also activate densely, meaning each activation is always firing on each input. But real world concepts are very sparse in any given context, only a small fraction of all concepts are relevant.

This motivates the use of sparse autoencoders, a method for identifying a handful of "features" in the neural network that are important to producing any given output, akin to the small set of concepts a person might have in mind when reasoning about a situation. Their features display sparse activation patterns that naturally align with concepts easy for humans to understand, even without direct incentives for interpretability.

Large language models represent a huge number of concepts, and our autoencoders may need to be correspondingly huge to get close to full coverage of the concepts in a frontier model. Learning a large number of sparse features is challenging, and past work has not been shown to scale well.

Research Progress towards Large-Scale Autoencoder-Training

This new state-of-the-art methodologies which allow us to scale our sparse autoencoders to tens of millions of features on frontier AI models. This methodology demonstrates smooth and predictable scaling, with better returns to scale than prior techniques and also introduces several new metrics for evaluating feature quality.

Recipe to train a variety of autoencoders on GPT-2 small and GPT-4 activations, including a 16 million feature autoencoder on GPT-4. To check interpretability of features - You can visualize a given feature by showing documents where it activates. Here are some interpretable features-

1. Human Imperfection - Phrases relating to things (especially humans) being flawed

2. Price Increases - Ends of phrases related to price increases

3. X and Y - GPT-2 small feature: phrases of the form X and Y

4. Training Logs - Machine learning training logs

5. Rhetorical Questions - rhetorical/exasperated questions

6. Algebraic Rings

7. Who/What theDopamine

Limitations -

The sparse autoencoder does not capture all the behavior of the original model.
Currently, passing GPT-4’s activations through the sparse autoencoder results in a performance equivalent to a model trained with roughly 10x less compute.
To fully map the concepts in frontier LLMs, we may need to scale to billions or trillions of features, which would be challenging even with our improved scaling techniques.
Sparse autoencoders can find features at one point in the model, but that’s only one step towards interpreting the model. Much further work is required to understand how the model computes those features and how those features are used downstream in the rest of the model.

Research on Scaling and evaluating sparse autoencoders -

Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents.

Proposed a method using k-sparse autoencoders to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity and also introduced several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects.

领英推荐

Outperforming LLMs with Fewer Data and Smaller Model…

Danny Butvinik 1 年前

The AI Vanguard Newsletter #5

Danny Butvinik 1 年前

Autonomous Agentic AI - Alternatives to Neuro-Symbolic…

Anand Ramachandran 5 个月前

These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. Released Code & Autoencoders for open-source models and Visualizer.

Reference Links -

Open AI Blog - https://openai.com/index/extracting-concepts-from-gpt-4/

Feature-Visualizer - https://openaipublic.blob.core.windows.net/sparse-autoencoder/sae-viewer/index.html

Github - https://github.com/openai/sparse_autoencoder

Paper Reading Link -https://cdn.openai.com/papers/sparse-autoencoders.pdf

Meta AI's No Language Left Behind - Scaling Human-Centered Machine Translation

In No Language Left Behind - First explored contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, created datasets and models aimed at narrowing the performance gap between low and high-resource languages.

More specifically, Developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.

Proposed multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically Evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety.

Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system.

Reference Links -

Paper Reading Link - https://arxiv.org/pdf/2207.04672

Github - https://github.com/facebookresearch/fairseq/tree/nllb

For more information on AI Research Papers you can visit my Github Profile -

https://github.com/aditikhare007/AI_Research_Junction_Aditi_Khare

For Receving latest updates on Advancements in AI Research Gen-AI, Quantum AI & Computer Vision you can subscribe to my AI Research Papers Summaries Newsletter using below link -

https://www.dhirubhai.net/newsletters/7152631955203739649/

Thank you & Happy Reading !

AI Research Junction

1,670 位关注者

要查看或添加评论，请登录

Aditi Khare的更多文章

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

2025年1月26日

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

#ai #genai #research #researchpapers #llm #inference LLM Inference-Time Self-Improvement - LLM Inference-Time Self…

1 条评论
OpenAI's AI Powered Search Engine Into ChatGPT

2024年11月1日

OpenAI's AI Powered Search Engine Into ChatGPT

#ai #searchgpt #airesearch #genai Introducing ChatGPT Search - ChatGPT can now search the web in a much better way than…
Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

2024年10月23日

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

#ai #airesearchpapers #genai #claude #anthropic For more information on AI Research Papers you can visit my Github…
OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

2024年10月12日

OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

#openai #ai #airesearch #airesearchpapers #researchskills For more information on AI Research Papers you can visit my…
Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

2024年10月7日

Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

#ai #genai #architecture #search #researchpapers #researchskills #computervision #pattern recognition Inference-time…
Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

2024年9月28日

Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

#ai #airesearch #meta #llm #genai #vision Meta has released Llama 3.2 - A small and medium-sized vision LLMs (11B and…
Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

2024年9月24日

Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

#ai #airesearch #genai #researchskills Agents in Software Engineering: Survey, Landscape, and Vision - Large Language…
Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

2024年9月23日

Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

#ai #airesearch #anthropic #embeddings #llm #genai Introducing Contextual Retrieval - Developers typically enhance an…
Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

2024年9月22日

Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

#ai #airesearch #airesearchpapers #genai #rl #llm Google's Training Language Models to Self-Correct via Reinforcement…
Learning to Reason with LLMs - Introducing OpenAI o1

2024年9月14日

Learning to Reason with LLMs - Introducing OpenAI o1

#ai #openai #llms #genai #airesearch #airesearchskills #airesearchpapers Introducing OpenAI o1-Preview - A new series…

1 条评论

See all articles

Open AI's Extracting Concepts from GPT-4 & Meta AI's No Language Left Behind- Scaling Human-Centered Machine Translation

Aditi Khare

AWS & AI Research [LLMs & Vision]-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | Inference Optimization | Hyperspectral Imaging | Open-Source Dev | Build Production-Grade AI Products from Scratch

Open AI's - Extracting Concepts from GPT-4

The challenge of interpreting neural networks

Research Progress towards Large-Scale Autoencoder-Training

领英推荐

AI Research Junction

1,670 位关注者

Aditi Khare的更多文章

社区洞察

其他会员也浏览了

Human-Centric AI: How Generative Models Understand and Mimic

AGI Through the Lens of LLMs: A Path Forward or a Beautiful Dead End?

AI by AI

Future of Artificial Intelligence

AI is Dead! Long Live AI

OpenAI Launches o1: A More Powerful Upgrade to GPT-4

This weekend I used Google's new NotebookLM.

Knowledge-intensive Language Understanding for Explainable AI

Unraveling Artificial Intelligence: How It Works and Its Promising Future"

AI and ML Technologies: Everything You Need to Know

Open AI's - Extracting Concepts from GPT-4

The challenge of interpreting neural networks

Research Progress towards Large-Scale Autoencoder-Training

领英推荐

AI Research Junction

1,670 位关注者

Aditi Khare的更多文章

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

OpenAI's AI Powered Search Engine Into ChatGPT

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

Learning to Reason with LLMs - Introducing OpenAI o1

社区洞察

其他会员也浏览了

Human-Centric AI: How Generative Models Understand and Mimic

AGI Through the Lens of LLMs: A Path Forward or a Beautiful Dead End?

AI by AI

Future of Artificial Intelligence

AI is Dead! Long Live AI

OpenAI Launches o1: A More Powerful Upgrade to GPT-4

This weekend I used Google's new NotebookLM.

Knowledge-intensive Language Understanding for Explainable AI

Unraveling Artificial Intelligence: How It Works and Its Promising Future"

AI and ML Technologies: Everything You Need to Know