What Do Aliens Look Like? The Limits of Data Fitting and Imagining the Unseen

What Do Aliens Look Like? The Limits of Data Fitting and Imagining the Unseen

What do Aliens look like? We'll answer this question at the end.

Our Brain Love to Fill in the Gaps

The benevolent see benevolence in it, and the wise see wisdom in it. -- 周易·系辞上

In 1731, King Frederick I of Sweden received a unique gift—a lion, the first in Scandinavia. After the lion died, the king wanted it preserved. However, the taxidermist had never seen a real lion and had to rely on artistic depictions and heraldic symbols. The resulting specimen looked comically inaccurate, known today as the "Gripsholm Castle Lion," displayed in Gripsholm Castle, Sweden.

In the 1970s, NASA's Viking 1 mission captured a peculiar image on Mars, resembling a human face, known as the "Face on Mars." This sparked significant excitement, with many speculating it was evidence of an extraterrestrial civilization.

Years later, the Mars Global Surveyor took a higher resolution image of the same site, revealing it was just a play of light and shadow, not a real face on Mars.

Let's consider another example. This skeleton appears to belong to an unfamiliar creature. Can you guess what animal it is? It likely looks like the one on the right.

In fact, this is a rabbit's skeleton. The shape of the skeleton and the actual appearance of the animal differ significantly, especially without its flesh and skin. It’s hard to accurately determine the real appearance of an animal based solely on its skeleton.

Logical Consistency, Stories Prevail

We should regard all laws and theories as hypotheses or conjectures. — Karl Popper

Whether it’s the 18th-century Swedish king’s lion specimen, the face on Mars, or reconstructing a monster from a rabbit’s skeleton, these are results of fitting data under insufficient information.

These examples show that with the same data, we can generate many "correct" interpretations, each potentially very different. Before acquiring more data, these explanations are "correct" because we lack extra evidence to refute them. Introducing cognitive biases can lead to hypotheses, conspiracy theories, ideologies, and beliefs. The most accepted explanation is often the most compelling and widely spread story, not necessarily the one that withstands future scrutiny. Some stories can never be falsified, which is a topic for another article.

Only when we gather more data does the truth become clearer. This additional information can reveal errors in our initial assumptions and fittings. Increasing data not only allows us to describe reality more accurately but also helps distinguish between random patterns and more enduring "laws."

AI's Way of "Filling the Gaps": Interpolation and Extrapolation

All models are wrong, but some are useful. — George Box

We fill in data gaps based on our understanding and biases, while AI models do the same using statistical data. There are two main methods: interpolation and extrapolation. Interpolation uses known data points to predict unknown ones within the range. Extrapolation predicts data points outside the known range.

Interpolation is easy to grasp. Imagine missing an episode in a TV series. You might guess what happened based on the episodes before and after, which is the essence of interpolation.

AI Interpolation Tool named ToolCrafter: Given start and end frames, the tool automatically fills in the animation.

Extrapolation relies on the assumption that patterns continue. The simplest form is linear extrapolation. For example, if NVIDIA stocks have been rising, financial institutions may keep raising expectations. Similarly, during a bear market, expectations are continually lowered.

This is a meme from the 2021 Fed stimulus period, where new investors based their conclusions on the past decade’s observations.

When "Filling the Gaps" Gone Wrong

Believing everything in books is worse than not reading them at all. — Mencius

When using AI chatbots, we are often amazed by their extensive knowledge and may eventually take their responses as indisputable facts.

These chatbots' models answer questions by remembering and using interpolation and extrapolation. The quality of answers depends on the "density" of the training data in that domain. When questions fall within dense knowledge areas, we get good answers. In sparse areas, they start to "imagine." Interested readers can check out another article: "The Habsburg Curse from Data: The Self-Iteration Vortex of AI."

As training data and model complexity increase, even the model creators can’t easily identify knowledge blind spots. However, there are ways to address this, as discussed in my previous article: "Acknowledging Ignorance: A Virtue for Humans, A Necessity for Machines." Ideally, models should be enthusiastic, courteous, and admit when they don't know.

The Pólya Conjecture

The Pólya Conjecture, proposed by Hungarian mathematician George Pólya in 1919, states: For any integer n greater than 1, the number of integers less than n with an odd number of prime factors is always greater than those with an even number of prime factors.

If you verify it for n from 1 to 16, and it holds true.

Verification of the first 16 items of the Pólya Conjecture.

However, in 1960, a specific counterexample was found: n = 906180359, where the Pólya Conjecture does not hold.

This example shows that even initially supported hypotheses can be overturned as data increases and complexity rises. Just as AI can make errors in sparse knowledge areas, we humans can also struggle to judge right from wrong without sufficient validation. The "failure" of the Pólya Conjecture is a perfect example.

Mathematics is pure logical reasoning and can be entirely theoretical. The laws of the physical world are even harder to predict and cannot be purely validated by reasoning. The laws we derive from observations can be overturned by the next observation.

Generative AI shines in image and language processing because these domains have high tolerance for errors, making interpolation and extrapolation flexible. However, generating specific knowledge without validation is unreliable.

Is AI-Generated Knowledge Unworkable?

Not really. At least for now, AI can be used as a copilot. It can generate quality hypotheses, which we can then verify.

This is similar to the candidate set generation part of recommendation systems, but the candidate set is generated rather than selected. Moreover, it is much broader, encompassing the entirety of world knowledge. AI's current strength is in measuring and finding related knowledge points, which can help us easily identify hypotheses that require cross-disciplinary knowledge.

In this way, AI can help accelerate scientific discovery and broaden research horizons, provided we use it rationally and rigorously. This aligns with the fundamental scientific research principle of "boldly hypothesize, carefully verify."

In Conclusion

The limits of data and the challenges faced by humans and AI in processing information remind us to remain vigilant. Through the Pólya Conjecture and other examples, we see how initially plausible hypotheses can be overturned with more data. This applies not only to mathematics and science but also to AI-generated knowledge.

AI can be a valuable tool for generating quality hypotheses, but we must verify them to ensure their accuracy. This is the basic principle of scientific research and the best way to use AI to advance knowledge. Personally, I believe AI will first help research as a Copilot, then evolve into an Autopilot. In terms of disciplines, it will start with knowledge-connected fields like the humanities, expand to logic-reasoning fields like mathematics, and eventually to fields requiring experimental evidence.

So, what do aliens look like?

Now to answer the initial question: What do aliens look like? Those who have read science fiction stories, your imagined Alien are exactly how they should be. Through our imagination and reasoning, we collectively build this strange and mysterious world.

Other Articles


要查看或添加评论,请登录

社区洞察

其他会员也浏览了