登录查看更多内容

What if… learning was compression

Antonio (Toni)

Member of Product Staff @ Google

发布日期: 2023年7月17日

Podcast version:

Imagine you have a large file on your computer that you want to send to a friend. The file is too big to fit in an email, so you need to make it smaller. How do you do that?

One way is to use a program like zip, which can compress the file into a shorter code. Zip works by finding patterns in the file and replacing them with shorter symbols. For example, if the file contains many instances of the word "the", zip can replace each occurrence with a symbol like "01". This way, zip can reduce the size of the file without losing too much information.

Now imagine you have a text that you want to label based on its content or sentiment. The text is about cats, and you want to know if it is positive or negative. How do you do that?

One way is to use a program like gzip, which is similar to zip but more powerful. Gzip can compress not only files, but also texts. Gzip works by finding patterns in the texts and replacing them with shorter symbols. For example, if the text contains many instances of the word "cat", gzip can replace each occurrence with a symbol like "10". This way, gzip can measure how similar two texts are based on their compressed sizes.

But what does this have to do with learning?

Learning is the process of acquiring knowledge or skills from data or experience. Learning can help us understand data and make better decisions.

In this article, we will explore a simple idea that connects compression and learning: learning is compression. Learning is compression means that when we learn from data, we try to find the simplest and most compact way to represent it. We try to capture the most important and relevant information, and ignore the rest.

We will show you how this idea is related to a powerful theory called Solomonoff's theory of inductive inference. This theory is a mathematical way of making predictions based on data, using logic and probability. This theory can explain why simpler and more consistent explanations are more likely to be true.

We will also show you two examples of how this idea can be used in real-world applications. The first example is MEmCom, a method for compressing data that can be used for voice assistants or app recommendation systems. The second example is a new method for classifying texts that can be used for spam detection or sentiment analysis.

We will also tell you about the Hutter Prize, which is a challenge for data compression and AI research based on Solomonoff's theory. The Hutter Prize rewards people who can compress a large text file from Wikipedia, with the goal of encouraging research in artificial intelligence.

We will conclude by summarizing the main points of our article, highlighting the contributions and implications of Solomonoff's theory for data compression and AI, and suggesting some directions for future research.

MEmCom: Compressing data for faster and better apps

Have you ever used a voice assistant like Siri or Alexa? Have you ever downloaded an app from an app store like Google Play or Apple Store? If so, you have probably used embeddings.

Embeddings are numbers that represent things like words, apps, or movies. Embeddings can help us measure how similar or different these things are, or how much we like them. For example, embeddings can help us find synonyms for words, or recommend apps that we might enjoy.

Embeddings are very useful, but they also have a problem: they are very large. They take up a lot of space on our devices, which can slow down our apps or drain our batteries.

To solve this problem, Pansare et al. proposed a new method for compressing embeddings for these apps, called MEmCom (Memory Efficient Compressed Embeddings). MEmCom uses zip to compress embeddings into shorter codes that can be stored and retrieved more easily.

MEmCom works by giving each thing (word, app, movie) a number in a table, then giving each number in the table a random code of numbers, then using zip to compress each code into a shorter code, then adding some extra numbers to each code that can be changed during training, then using the final code as the embedding for each thing.

MEmCom makes the embeddings much smaller, while still keeping them unique and meaningful. MEmCom also keeps the information of embeddings, as it learns the best extra numbers for each thing during training. MEmCom can work with any type of embeddings, such as word2vec, GloVe, or BERT. MEmCom can also handle new things that are added to the table later.

MEmCom has been tested on several tasks and datasets in natural language processing (NLP) and recommender systems. The results show that MEmCom can perform as well as or better than normal embeddings or other compression methods, while using much less space. MEmCom also improves the speed of apps on mobile devices compared to normal embeddings or other compression methods.

MEmCom is an example of how learning is compression, as it uses zip to compress embeddings while preserving their meaning and function. MEmCom also demonstrates how compression can enhance the performance and quality of apps that need to work fast and offline.

领英推荐

How Does Active Learning Machine Learning Work?

StrataScratch 4 个月前

Active Learning vs. Passive Learning: It’s All Active…

Roger Stark 1 年前

Using Knowledge Graphs to create dynamic learning…

Ajit Jaokar 10 个月前

A parameter-free classification method with compressors: Using zip for text classification

Have you ever received a spam email or come across a fake news article? Have you ever read a movie review or a product review? If so, you have probably encountered text classification.

Text classification is a common task in natural language processing (NLP), where the goal is to assign labels to texts based on their content or sentiment. Text classification has many uses, such as spam detection, sentiment analysis, topic modeling, and more.

Text classification can be done using different methods, such as rules, machine learning models, or deep learning models. These methods can be powerful, but they also have a problem: they require a substantial amount of data and computation. Acquiring large labeled datasets for training and fine-tuning can be challenging and expensive. Moreover, training and inference with these methods can be computationally intensive and impractical.

To address this challenge, Jiang et al. proposed a new method for text classification that does not require any data or computation. The method uses a simple compressor like zip to measure the similarity between texts and applies a simple algorithm to assign labels.

The method works by using zip to compress each text in the training set and the test set, obtaining compressed sizes. It then calculates the similarity between each pair of texts in the test set and the training set based on their compressed sizes. Next, it finds the most similar texts in the training set for each text in the test set and assigns the label of the most common class among the most similar texts to each text in the test set.

The method does not need any data or computation, as it solely relies on zip and similarity measures. It also does not require any language-specific knowledge or pre-training, as it can work with any type of text data, regardless of language or topic.

The method has been tested on six standard datasets and five challenging datasets for text classification. The standard datasets are common benchmarks for English text classification tasks, such as IMDB movie reviews and AG news articles. The challenging datasets involve different languages and topics, including Arabic news articles and Chinese product reviews.

The results demonstrate that this method performs as well as or better than non-pretrained deep learning methods on the standard datasets and outperforms BERT on the challenging datasets, including those in less common languages. The method also shows promising results with limited labeled data, where deep learning methods may not be as effective.

This method exemplifies how learning is compression, as it utilizes zip to measure text similarity based on content and sentiment. It also highlights how compression can enable effective text classification without the need for extensive data or computation.

Solomonoff's theory of inductive inference: A simple and powerful way of making predictions

Have you ever wondered how we can make predictions based on data? How can we anticipate what will happen next? How can we determine the most plausible explanation for what we observe?

One powerful theory that tackles these questions is Solomonoff's theory of inductive inference. This theory provides a mathematical framework for making predictions based on data, employing principles of logic and probability.

Solomonoff's theory assigns a probability to each possible explanation based on its simplicity. The simpler an explanation, the higher its probability. For instance, an explanation stating that all even numbers are divisible by 2 is simpler than an explanation asserting that all prime numbers are divisible by 7. Therefore, the former explanation has a higher probability than the latter.

Solomonoff's theory further updates the probability of each explanation based on how well it fits the observed data. The better an explanation aligns with the data, the higher its probability becomes. For example, if we observe that 2, 4, 6, and 8 are divisible by 2 but not by 7, we can update our beliefs and assign a higher probability to the explanation that all even numbers are divisible by 2 compared to the explanation involving prime numbers and 7.

Solomonoff's theory is based on two fundamental principles: logic and probability. Logic guides our reasoning from facts to conclusions, utilizing rules of inference. Probability enables us to measure the uncertainty and confidence of our beliefs, employing rules of calculus. By combining these principles, Solomonoff's theory offers a rational and consistent approach to making predictions based on data.

Solomonoff's theory exemplifies how learning is compression, as it employs simplicity to gauge the informativeness and plausibility of explanations. Furthermore, it demonstrates how compression aids in identifying the best explanations for data, enabling optimal decision-making.

Conclusion

In this article, we have explored the idea of learning as compression and its connection to Solomonoff's theory of inductive inference. We have demonstrated how this theory explains the higher likelihood of simpler and more consistent explanations. Additionally, we have provided two real-world examples, MEmCom and the parameter-free classification method, showcasing the practical applications of learning as compression. Moreover, we have introduced the Hutter Prize, which serves as a platform to encourage research in data compression and AI based on Solomonoff's theory.

We hope this article has deepened your understanding of learning as compression and sparked your interest in this fascinating field. For further exploration of the topic, you may find the following sources valuable:

Learning Compressed Embeddings for On-Device Inference: This paper introduces MEmCom, a novel approach for reducing the size of embedding tables while maintaining unique mappings for each entity.
"Low-Resource" Text Classification: A Parameter-Free Classification Method with Compressors: This paper presents a parameter-free method for text classification that uses gzip to measure text similarity and applies a k-nearest-neighbor classifier for labeling.
A General Theory of Inductive Inference: This paper introduces Solomonoff's theory of inductive inference, providing a comprehensive mathematical framework for making predictions based on observed data, utilizing algorithmic information theory and Bayesian inference.
500,000€ Prize for Compressing Human Knowledge: Visit the official website of the Hutter Prize to learn more about the challenge, participation, previous records, and motivations.

Thank you for taking the time to read our article! ??

Kenghagho Franklin

PhD Student (Artificial Intelligence) at University Bremen

4 个月

I agree. There is an established and rigorous work on the topic "Learning as Data Compression".

要查看或添加评论，请登录

Antonio (Toni)的更多文章

Invisible Threads

2024年10月8日

Invisible Threads

Ever wonder why some people can lead a group without being the "boss"? It's not magic, and it's definitely not about…
Talking with AI #1

2024年9月6日

Talking with AI #1

So I was chatting with this AI assistant, and we went on this crazy journey through how humans have shared info…
Stone to Silicon: A Tour of Human Problem 'Solving'

2024年4月27日

Stone to Silicon: A Tour of Human Problem 'Solving'

This timeline traces the evolution of human problem-solving methods, from early instinctual approaches to the upcoming…

1 条评论
The Open Source - Part I & II

2024年2月26日

The Open Source - Part I & II

Part I: The Open Source Dance: Rewards, Respect, and the Joy of Creation In the intricate dance of human collaboration,…
Generative AI Competition: Today and Tomorrow

2023年11月5日

Generative AI Competition: Today and Tomorrow

Generative models are one of the most exciting and promising areas of artificial intelligence. They are capable of…
The Power of Language Models in Recommendation Systems

2023年10月20日

The Power of Language Models in Recommendation Systems

Imagine this: You've just finished an intense workout, and as you flop down on the couch, your streaming service…
Meta's AI Odyssey: Shaping the Future

2023年9月4日

Meta's AI Odyssey: Shaping the Future

Artificial intelligence (AI) stands as a beacon of transformation in our modern world, poised to reshape every facet of…
Navigating the Path Between Generalists and Specialists in a Shifting Landscape

2023年8月21日

Navigating the Path Between Generalists and Specialists in a Shifting Landscape

Is there a place for generalists in the modern world? The perennial debate over whether it's better to be a "jack of…
Are Companies Missing Out on the Power of AI in Problem-Solving?

2023年8月18日

Are Companies Missing Out on the Power of AI in Problem-Solving?

In the evolving tech landscape, AI tools like ChatGPT are revolutionizing problem-solving. Despite its proven efficacy…
Expanding on 'Patterns for Building LLM-based Systems & Products'

2023年8月8日

Expanding on 'Patterns for Building LLM-based Systems & Products'

Large language models (LLMs) are powerful tools for natural language processing and generation. They can perform a…

See all articles

What if… learning was compression

Antonio (Toni)

Member of Product Staff @ Google

MEmCom: Compressing data for faster and better apps

领英推荐

A parameter-free classification method with compressors: Using zip for text classification

Solomonoff's theory of inductive inference: A simple and powerful way of making predictions

Conclusion

Antonio (Toni)的更多文章

社区洞察

其他会员也浏览了

Contrastive Learning: Transforming Representation Learning and Data Exploration

How to learn any new Tool , Technology , Mastering yourself as per as human

Accelerating Learning: Harnessing the Power of AI

Learning Code 1 Unlocked: Meaning Network

Effective Learning Strategies for IT and AI*

How Humans Learn, How Machines Learn, And How We Learn Together

Article 3: Building Your AI and ML Skillset: Learning Pathways

Will AI make learning more intelligent?

Create Your AI Learning Schedule

Theory of Flow and Learning Design

MEmCom: Compressing data for faster and better apps

领英推荐

A parameter-free classification method with compressors: Using zip for text classification

Solomonoff's theory of inductive inference: A simple and powerful way of making predictions

Conclusion

Antonio (Toni)的更多文章

Invisible Threads

Talking with AI #1

Stone to Silicon: A Tour of Human Problem 'Solving'

The Open Source - Part I & II

Generative AI Competition: Today and Tomorrow

The Power of Language Models in Recommendation Systems

Meta's AI Odyssey: Shaping the Future

Navigating the Path Between Generalists and Specialists in a Shifting Landscape

Are Companies Missing Out on the Power of AI in Problem-Solving?

Expanding on 'Patterns for Building LLM-based Systems & Products'

社区洞察

其他会员也浏览了

Contrastive Learning: Transforming Representation Learning and Data Exploration

How to learn any new Tool , Technology , Mastering yourself as per as human

Accelerating Learning: Harnessing the Power of AI

Learning Code 1 Unlocked: Meaning Network

Effective Learning Strategies for IT and AI*

How Humans Learn, How Machines Learn, And How We Learn Together

Article 3: Building Your AI and ML Skillset: Learning Pathways

Will AI make learning more intelligent?

Create Your AI Learning Schedule

Theory of Flow and Learning Design