Quick Intuition for Understanding GenAI by Thinking in Dimensionality

Quick Intuition for Understanding GenAI by Thinking in Dimensionality

Introduction: How to Learn AI for Non-Data Scientists - Bottom-Up or Top-Down?

I recently saw a short video. The gist went something like this:

If you're new to AI and want to learn about it, you've likely encountered two learning approaches:

  • Bottom-up: Start with the foundations—learn the math, understand the code, and slowly build up your knowledge base. Many online classes follow this structure.
  • Top-down: Start by building things—experiment with tools, see how they work, and dive into the underlying principles only when you get stuck.
  • And the top-down approach works a lot faster.

This resonated with me. Looking back, I did a bit of both. Like most of us, I don’t have a Ph.D. in applied math or information theory. I spent a couple of years flunking through physics in undergrad before switching to English, though I did take courses like multivariable calculus and differential equations. But that was decades ago, and I haven’t touched those books since. So, I’m no math whiz. For me, the top-down approach makes a lot more sense—and it's far less frustrating.

Learning AI today is a bit like learning to drive a car in the early 20th century. Back then, driving was a highly technical job, as drivers had to know how clutches and carburetors worked together (and there were no automatic transmissions). However, they didn’t need to be engineers, because cars were reliable enough for practical use. Similarly, today’s AI tools don’t require you to be a mathematician—you just need to know enough to get things working.

Take, for example, the seminal 2018 paper, Attention Is All You Need. It wouldn’t have been possible without earlier mathematicians like:

  • Pierre-Simon Laplace (Bayesian Probability & Softmax)
  • David Hilbert (Hilbert Spaces & Vector Representations): Hilbert’s inner product space theory influences dot-product attention, which is central to how attention networks (like LLMs) work. Embeddings, represented as high-dimensional vectors, follow principles from Hilbert spaces.
  • Joseph Fourier (Fourier Transform & Positional Encoding)
  • Other contributions: Calculus (Newton, Leibniz), Linear Algebra (Euler, Gauss).

Learning all of these foundational theories would take years. But if you focus on key areas that matter to you, gaining a practical understanding is entirely doable.

Personally, I’ve got a decent intuition for embeddings, but the key/query/value part was challenging—so I spent more time reviewing vector spaces and inner products. On the other hand, the Fourier transform is complex, but I consider it less critical because PyTorch handles it for you.

Here are some quick tips that worked for me:

  • Developing a good intuition is more important than knowing every detail.
  • When learning GenAI (enough to build tools), two key intuitions matter most: dimensionality and levels of abstraction.

In this article, we’ll focus on dimensionality.


Dimensionality in Everyday Life

Remember high school algebra? The x, y, and z axes (1st, 2nd, and 3rd dimensions), and maybe the 4th dimension—popularized by shows like Star Trek and movies like Back to the Future. But beyond that, what do dimensions have to do with everyday life?

Here are two examples:

Example 1: Imagine flat images of disks—some are perfect circles, others are ellipses. In just two dimensions (x and y), it’s hard to interpret them. But now, if we add the third dimension (z-axis) and imagine these disks are saucers rotating in space, the shapes on a flat surface are just shadows. Suddenly, everything becomes clearer. If we need to calculate how they expand and shrink, it's as simple as using sine/cosine equations.

Key takeaway: Adding a dimension often simplifies complex problems.


Example 2: German philosopher Hegel is well known for his concept of dialectics, a method of solving problems by examining opposing viewpoints (thesis and antithesis) and synthesizing them into a new, refined idea. You may have heard of it, though Marx and Nietzsche are more widely read in the U.S., despite Hegel and Kant having had a greater impact on modern science.

Within Hegel’s dialectics, there’s a concept called sublation (Aufhebung), which means both to preserve and to negate at the same time. Hegel uses the growth of a plant to explain this in Phenomenology of Spirit:

"The bud disappears when the blossom breaks through... in the same way, when the fruit comes, the blossom may be explained to be a false form of the plant’s existence, for the fruit appears as the truth in place of the blossom."


In this process:

  • The bud represents an initial idea.
  • The blossom negates it (a new stage emerges).
  • The fruit (sublation) transcends both, preserving and negating the previous stages while forming a higher understanding.

Dimensionality Explained: Sublation can be thought of as "lifting" to a higher dimension, where a system integrates previous stages. For instance, human babies replace over 90% of their cells within a year or two, yet they are still considered the same person. Through this, we comprehend growth and change—similar to how AI models "transcend" different concepts by adding or reducing dimensions as they refine predictions.


Vectors & Matrices in GenAI

Vectors: A vector is a list of numbers, but you can think of it as representing both magnitude and direction in a complex space.

Matrices: A matrix is a collection of vectors that forms a shape in space, which can be transformed by:

  • Shifting to a new coordinate system, stretching or rotating.
  • Expanding to higher dimensions (e.g., from a disk to a sphere) or reducing dimensions (e.g., from a cube to a rectangle).


What Is Word Embedding?

Simply put, embedding converts a word into a series of numbers (a vector). Unlike basic encoding (which turns text into binary), embedding is far more data intensive.


For example, in standard encoding, "artificial intelligence" is represented as a string of numbers based on the ASCII values of its characters. But in embedding, each word is represented by a vector with hundreds or thousands of dimensions, packing much more information. Different models pack embeddings differently, with varying dimensional sizes.

Here are some examples:

  • OpenAI (text-embedding-ada-002): 1536 dimensions
  • spaCy (en_core_web_sm): 300 dimensions
  • HuggingFace BERT (bert-base-uncased): 768 dimensions
  • LLaMA 3-8B: 1024 dimensions (larger models go up to 4096 dimensions)
  • Google GEMI: Starts at 2048 dimensions, going up to 16,384 (similar to Microsoft's largest models)
  • NVIDIA Megatron-LM (530B): 32,768 dimensions


What Is Attention?

The attention mechanism allows models to focus on the most relevant parts of an input sequence. Introduced in the paper Attention Is All You Need, this concept is central to modern transformer models like GPT-3.

The equation is:

Attention(Q, K, V)=softmax(Q(transpose(K)/dk)V

  • V (Value): Represents the embeddings (word meanings) combined with positional encoding (word order). V is a matrix made up of vectors, where each vector represents a word in the sentence. Embeddings: The model looks up the vector for each word from a pre-trained model (like OpenAI or LLaMA), similar to looking up definitions in a dictionary. Positional Encoding: Since word order matters, the model adjusts these vectors based on the position of each word in the sentence. This ensures the model understands the sequence, not just the individual words.

In the end, you get a matrix that combines the meaning of each word with its position in the sentence.

  • K (Key Transpose): Represents the "self-attention" process, where the model compares each word with others to assign weights based on relevance. For example, you compare "artificial" with "artificial", "intelligence", "is", "good", and so on. This creates a matrix that shows how each word relates to the others. Certain word combinations get more weight based on how often they naturally occur together. For instance, "artificial" and "intelligence" will have a high score because they make sense together, while "intelligence" and "artificial" might get a lower score.
  • Q (Query): The context or focus of the attention mechanism, depending on external inputs or tasks.

The softmax function normalizes these comparisons into probabilities, allowing the model to focus more on important words and less on irrelevant ones.

Finally, there’s multi-head attention, which splits the process into multiple “heads,” each focusing on different parts of the input. For example, if your initial embedding size is 512, each head will process 64 dimensions. By combining these perspectives, the model gets a more complete understanding of the input, much like the five blind men describing different parts of an elephant.


Summary of the Process:

  1. The input sentence is tokenized into words.
  2. Each word is converted into a vector (embedding), forming a matrix of word vectors.
  3. The matrix is split across 8 heads (each head processes part of the vectors).

For each head:

  • The vectors are adjusted based on word positions (this gives you V).
  • The embeddings are compared to each other (dot product) to create a matrix that captures relationships between words. This matrix is scaled, normalized with softmax, and combined with transformations from Q and K.
  • This resulting matrix is then used to transform V, producing the head's output.

Finally, all 8 heads are combined and passed through the decoder to generate the final result.

Compared to earlier approaches, this process involves iterations of high-dimensional computations. The higher the dimensions, the more complexity it can handle. For now, large language models (LLMs) speed this up using GPUs, which can efficiently handle matrix operations.

Ultimately, if singularity ever happens, it will likely be in the quantum computing age - when computers can dial up and down to hundreds of thousands or even millions of dimensions.


Conclusion: Dimensionality Isn’t Intimidating – We Use It Daily!

Dimensionality is everywhere—in how we understand growth, relationships, and learning. Just as adding an extra dimension makes complex problems easier to solve, modern AI models use higher dimensions to refine their predictions. AI is constantly integrating information across dimensions to understand our world. And as daunting as these concepts might seem at first glance, we interact with them intuitively in everyday life.

So, while you don’t need to be an expert in Hilbert spaces or Fourier transformations, developing a good intuition about dimensionality can take you a long way in understanding and working with AI.

?

Stephen Nickel

Ready for the real estate revolution? ?? | AI-driven bargains at your fingertips | Proptech Expert | My Exit with 33 years and the startup comeback. ???????

5 个月

ai can be a maze, but diving into dimensionality empowers understanding. what part of ai sparks your curiosity most? Xiao-Fei Zhang

要查看或添加评论,请登录

Xiao-Fei Zhang的更多文章

社区洞察

其他会员也浏览了