What if… learning was compression
Podcast version:
Imagine you have a large file on your computer that you want to send to a friend. The file is too big to fit in an email, so you need to make it smaller. How do you do that?
One way is to use a program like zip, which can compress the file into a shorter code. Zip works by finding patterns in the file and replacing them with shorter symbols. For example, if the file contains many instances of the word "the", zip can replace each occurrence with a symbol like "01". This way, zip can reduce the size of the file without losing too much information.
Now imagine you have a text that you want to label based on its content or sentiment. The text is about cats, and you want to know if it is positive or negative. How do you do that?
One way is to use a program like gzip, which is similar to zip but more powerful. Gzip can compress not only files, but also texts. Gzip works by finding patterns in the texts and replacing them with shorter symbols. For example, if the text contains many instances of the word "cat", gzip can replace each occurrence with a symbol like "10". This way, gzip can measure how similar two texts are based on their compressed sizes.
But what does this have to do with learning?
Learning is the process of acquiring knowledge or skills from data or experience. Learning can help us understand data and make better decisions.
In this article, we will explore a simple idea that connects compression and learning: learning is compression. Learning is compression means that when we learn from data, we try to find the simplest and most compact way to represent it. We try to capture the most important and relevant information, and ignore the rest.
We will show you how this idea is related to a powerful theory called Solomonoff's theory of inductive inference. This theory is a mathematical way of making predictions based on data, using logic and probability. This theory can explain why simpler and more consistent explanations are more likely to be true.
We will also show you two examples of how this idea can be used in real-world applications. The first example is MEmCom, a method for compressing data that can be used for voice assistants or app recommendation systems. The second example is a new method for classifying texts that can be used for spam detection or sentiment analysis.
We will also tell you about the Hutter Prize, which is a challenge for data compression and AI research based on Solomonoff's theory. The Hutter Prize rewards people who can compress a large text file from Wikipedia, with the goal of encouraging research in artificial intelligence.
We will conclude by summarizing the main points of our article, highlighting the contributions and implications of Solomonoff's theory for data compression and AI, and suggesting some directions for future research.
MEmCom: Compressing data for faster and better apps
Have you ever used a voice assistant like Siri or Alexa? Have you ever downloaded an app from an app store like Google Play or Apple Store? If so, you have probably used embeddings.
Embeddings are numbers that represent things like words, apps, or movies. Embeddings can help us measure how similar or different these things are, or how much we like them. For example, embeddings can help us find synonyms for words, or recommend apps that we might enjoy.
Embeddings are very useful, but they also have a problem: they are very large. They take up a lot of space on our devices, which can slow down our apps or drain our batteries.
To solve this problem, Pansare et al. proposed a new method for compressing embeddings for these apps, called MEmCom (Memory Efficient Compressed Embeddings). MEmCom uses zip to compress embeddings into shorter codes that can be stored and retrieved more easily.
MEmCom works by giving each thing (word, app, movie) a number in a table, then giving each number in the table a random code of numbers, then using zip to compress each code into a shorter code, then adding some extra numbers to each code that can be changed during training, then using the final code as the embedding for each thing.
MEmCom makes the embeddings much smaller, while still keeping them unique and meaningful. MEmCom also keeps the information of embeddings, as it learns the best extra numbers for each thing during training. MEmCom can work with any type of embeddings, such as word2vec, GloVe, or BERT. MEmCom can also handle new things that are added to the table later.
MEmCom has been tested on several tasks and datasets in natural language processing (NLP) and recommender systems. The results show that MEmCom can perform as well as or better than normal embeddings or other compression methods, while using much less space. MEmCom also improves the speed of apps on mobile devices compared to normal embeddings or other compression methods.
MEmCom is an example of how learning is compression, as it uses zip to compress embeddings while preserving their meaning and function. MEmCom also demonstrates how compression can enhance the performance and quality of apps that need to work fast and offline.
领英推荐
A parameter-free classification method with compressors: Using zip for text classification
Have you ever received a spam email or come across a fake news article? Have you ever read a movie review or a product review? If so, you have probably encountered text classification.
Text classification is a common task in natural language processing (NLP), where the goal is to assign labels to texts based on their content or sentiment. Text classification has many uses, such as spam detection, sentiment analysis, topic modeling, and more.
Text classification can be done using different methods, such as rules, machine learning models, or deep learning models. These methods can be powerful, but they also have a problem: they require a substantial amount of data and computation. Acquiring large labeled datasets for training and fine-tuning can be challenging and expensive. Moreover, training and inference with these methods can be computationally intensive and impractical.
To address this challenge, Jiang et al. proposed a new method for text classification that does not require any data or computation. The method uses a simple compressor like zip to measure the similarity between texts and applies a simple algorithm to assign labels.
The method works by using zip to compress each text in the training set and the test set, obtaining compressed sizes. It then calculates the similarity between each pair of texts in the test set and the training set based on their compressed sizes. Next, it finds the most similar texts in the training set for each text in the test set and assigns the label of the most common class among the most similar texts to each text in the test set.
The method does not need any data or computation, as it solely relies on zip and similarity measures. It also does not require any language-specific knowledge or pre-training, as it can work with any type of text data, regardless of language or topic.
The method has been tested on six standard datasets and five challenging datasets for text classification. The standard datasets are common benchmarks for English text classification tasks, such as IMDB movie reviews and AG news articles. The challenging datasets involve different languages and topics, including Arabic news articles and Chinese product reviews.
The results demonstrate that this method performs as well as or better than non-pretrained deep learning methods on the standard datasets and outperforms BERT on the challenging datasets, including those in less common languages. The method also shows promising results with limited labeled data, where deep learning methods may not be as effective.
This method exemplifies how learning is compression, as it utilizes zip to measure text similarity based on content and sentiment. It also highlights how compression can enable effective text classification without the need for extensive data or computation.
Solomonoff's theory of inductive inference: A simple and powerful way of making predictions
Have you ever wondered how we can make predictions based on data? How can we anticipate what will happen next? How can we determine the most plausible explanation for what we observe?
One powerful theory that tackles these questions is Solomonoff's theory of inductive inference. This theory provides a mathematical framework for making predictions based on data, employing principles of logic and probability.
Solomonoff's theory assigns a probability to each possible explanation based on its simplicity. The simpler an explanation, the higher its probability. For instance, an explanation stating that all even numbers are divisible by 2 is simpler than an explanation asserting that all prime numbers are divisible by 7. Therefore, the former explanation has a higher probability than the latter.
Solomonoff's theory further updates the probability of each explanation based on how well it fits the observed data. The better an explanation aligns with the data, the higher its probability becomes. For example, if we observe that 2, 4, 6, and 8 are divisible by 2 but not by 7, we can update our beliefs and assign a higher probability to the explanation that all even numbers are divisible by 2 compared to the explanation involving prime numbers and 7.
Solomonoff's theory is based on two fundamental principles: logic and probability. Logic guides our reasoning from facts to conclusions, utilizing rules of inference. Probability enables us to measure the uncertainty and confidence of our beliefs, employing rules of calculus. By combining these principles, Solomonoff's theory offers a rational and consistent approach to making predictions based on data.
Solomonoff's theory exemplifies how learning is compression, as it employs simplicity to gauge the informativeness and plausibility of explanations. Furthermore, it demonstrates how compression aids in identifying the best explanations for data, enabling optimal decision-making.
Conclusion
In this article, we have explored the idea of learning as compression and its connection to Solomonoff's theory of inductive inference. We have demonstrated how this theory explains the higher likelihood of simpler and more consistent explanations. Additionally, we have provided two real-world examples, MEmCom and the parameter-free classification method, showcasing the practical applications of learning as compression. Moreover, we have introduced the Hutter Prize, which serves as a platform to encourage research in data compression and AI based on Solomonoff's theory.
We hope this article has deepened your understanding of learning as compression and sparked your interest in this fascinating field. For further exploration of the topic, you may find the following sources valuable:
Thank you for taking the time to read our article! ??
PhD Student (Artificial Intelligence) at University Bremen
4 个月I agree. There is an established and rigorous work on the topic "Learning as Data Compression".