AI-Generated Entropy vs Human-Generated Entropy. A quick glimpse!

AI-Generated Entropy vs Human-Generated Entropy. A quick glimpse!

Entropy is a fundamental concept in the fields of physics and information theory.

Entropy in the field of Physics

Definition 1 > Entropy is a measure of disorderliness, and the declaration that entropy is always on the rise — known as the second law of thermodynamics — is among nature’s most inescapable commandments.

Definition 2 > Entropy is a measure of disorder. It corresponds with how many possible microscopic configurations can underlie an overall state.

Definition 3 > Entropy can also be considered a measure of uncertainty. The more disordered a set of particles is, the more uncertain their exact arrangement.

The word entropy comes after the Greek word for transformation. The German physicist Rudolf Claussius coined the term, and laid out what became known as the second law of thermodynamics: “The entropy of the universe tends to a maximum.”

Entropy in the field of Information Theory

Claude Shannon, an American mathematician, who has been called the father of information theory, understood entropy as uncertainty.

While working to encrypt communication channels for the highest US and UK Government levels during World War II the experience led him to think deeply about the fundamentals of communication over the following years. Shannon sought to measure the amount of information contained in a message. He did so in a roundabout way, by treating knowledge as a reduction in uncertainty.

During this experience, Shannon came up with an equation that has nothing to do with steam engines. Given a set of possible characters in a message, Shannon’s formula defines the uncertainty about which character will appear next as the sum of the probability of each character appearing multiplied by the logarithm of that probability.

But if any character is equally probable, Shannon’s formula gets simplified and becomes the same as Boltzmann’s formula for entropy.

Just as thermodynamic entropy describes the efficiency of an engine, information entropy captures the efficiency of communication. It corresponds with the number of yes-or-no questions needed to figure out the contents of a message.

A high-entropy message is a patternless one, without a way to guess the next character, the message requires many questions to be fully revealed. A message with a lot of patterns contains less information and is easier to guess.

It’s a very beautiful interlocking picture of information and entropy, entropy is information we don’t know, while information is information we do know.

Notions of entropy developed in disparate contexts fit together neatly

A rise in entropy corresponds to a loss in information about microscopic details. In statistical mechanics, for instance, as particles in a box get mixed up and we lose track of their positions and momentums, the “Gibbs entropy” increases.

As particles become entangled with their environment in quantum mechanics, thus scrambling their quantum state, the “von Neumann entropy” rises. As matter falls into a black hole and information about it gets lost to the outside world, the "Bekenstein-Hawking entropy" goes up.

What entropy consistently measures is ignorance: a lack of knowledge about the motion of particles, the next digit in a string of code, or the exact state of a quantum system.

Calculated Entropy – Human/AI Text

In a recent paper Significance of Entropy in Combating AI-Driven?Disinformation published in the Journal for High Schoolers in 2023, by H. Widjaja, S. Das, G.D. Dixon, and F. Basher the authors experimented to determine the entropy generated by Human vs AI Text.

Entropy as a differentiated benchmark in text generation

In the wake of greater AI integration into the web, there’s been a far greater quantity of AI-produced content online. In a world amidst rampant misinformation, we lack a base method of media authentication. The inclusion of AI technologies, which can easily be used to generate false content, is becoming a bigger issue. Fake or mis-contextualized content can be used to push harmful agendas or spread propaganda.

This leads to the creation of a benchmark to compare Human-Generated text vs AI-generated text, which is a key step toward a design for content authenticity in the media.

This is where the concept of entropy comes in as an aiding tool to determine the efficiency of a literary text sample.

For example, let’s say languages A and B exist, where both languages often communicate the same meaning through a different syntax. Whereas Language A often uses words such as “hi”, “great”, or “bad”, Language B elects to use language such as “greetings”, “fantastic”, or “awful”.

These languages will have different entropies. Language A will have a lower entropy, as it often communicates the same message as Language B but uses fewer bits to do. Because it takes fewer bits for Language A to communicate the same concepts as Language B, it is more efficient in its communicative ability and therefore has a lower disorder and entropy.

Explanation of an N-gram

An n-gram is representative of a certain number of character combinations which include, but are not limited to, the alphabet, punctuation, and numbers, as well as combinations of them. For example, 1-grams represent all singular characters (such as a, b, c, etc.) In English, considering only lowercase letters, there are 26 1-grams. Likewise, 2-grams are representative of all 2-character pairs (such as aa, ab, ac, etc.), thus there are 26^2 2-grams.

N-grams account for all possible character combinations given a certain number of characters. In our case, we only used lower-case alphabetical 1- and 2-grams to provide preliminary results. By including 2-grams, we conclude with a more accurate textual entropy estimation.

The conclusions of the experiment were the following

As per their experiment, the calculated 2-gram entropy for human-generated text – 3.883 – was higher than AI-generated text – 3.119. This signifies that AI text has lower uncertainty and higher efficiency, and we can conclude that AI text has a 20% lower uncertainty.

The diagram of the Entropic separation of AI and human-generated text represents the evaluation of texts and the threshold for which we consider text AI-generated or human-generated. If any text is deemed to have an entropy <3.119, it is most likely AI-generated, and any text with an entropy >3.883 is likely human-generated. The diagram also represents the 20% delta between the certain values distinguishing a fully certain outcome of the type of text.

If any text is deemed to have an entropy <3.119, it is most likely AI-generated, and any text with an entropy >3.883 is likely human-generated.

Conclusion

In the end, the conclusion we draw from this information is, that all the systems running in our world, whether physical, chemical, mechanical, electrical, or informational are all ruled by the second law of thermodynamics, and entropy only tends to increase.

Taking a final reflection from author Zack Savitsky, a contributing writer for Quanta Magazine:

"The trend toward messiness is what powers all our machines. While the decay of useful energy does limit our abilities, sometimes a new perspective can reveal a reservoir of order hidden in the chaos. Furthermore, a disordered cosmos is one that’s increasingly filled with possibility.

We cannot circumvent uncertainty, but we can learn to manage it — and maybe even embrace it. After all, ignorance is what motivates us to seek knowledge and construct stories about our experiences. Entropy, in other words, is what makes us human.

You can bemoan the inescapable collapse of order, or you can embrace uncertainty as an opportunity to learn, to sense and deduce, to make better choices, and to capitalize on your motive power of"

要查看或添加评论,请登录

Samuel Ignacio Larios的更多文章

社区洞察

其他会员也浏览了