登录查看更多内容

Quick Intuition for Understanding GenAI by Thinking in Dimensionality

Xiao-Fei Zhang

发布日期: 2024年9月24日

+ 关注

Introduction: How to Learn AI for Non-Data Scientists - Bottom-Up or Top-Down?

I recently saw a short video. The gist went something like this:

If you're new to AI and want to learn about it, you've likely encountered two learning approaches:

Bottom-up: Start with the foundations—learn the math, understand the code, and slowly build up your knowledge base. Many online classes follow this structure.
Top-down: Start by building things—experiment with tools, see how they work, and dive into the underlying principles only when you get stuck.
And the top-down approach works a lot faster.

This resonated with me. Looking back, I did a bit of both. Like most of us, I don’t have a Ph.D. in applied math or information theory. I spent a couple of years flunking through physics in undergrad before switching to English, though I did take courses like multivariable calculus and differential equations. But that was decades ago, and I haven’t touched those books since. So, I’m no math whiz. For me, the top-down approach makes a lot more sense—and it's far less frustrating.

Learning AI today is a bit like learning to drive a car in the early 20th century. Back then, driving was a highly technical job, as drivers had to know how clutches and carburetors worked together (and there were no automatic transmissions). However, they didn’t need to be engineers, because cars were reliable enough for practical use. Similarly, today’s AI tools don’t require you to be a mathematician—you just need to know enough to get things working.

Take, for example, the seminal 2018 paper, Attention Is All You Need. It wouldn’t have been possible without earlier mathematicians like:

Pierre-Simon Laplace (Bayesian Probability & Softmax)
David Hilbert (Hilbert Spaces & Vector Representations): Hilbert’s inner product space theory influences dot-product attention, which is central to how attention networks (like LLMs) work. Embeddings, represented as high-dimensional vectors, follow principles from Hilbert spaces.
Joseph Fourier (Fourier Transform & Positional Encoding)
Other contributions: Calculus (Newton, Leibniz), Linear Algebra (Euler, Gauss).

Learning all of these foundational theories would take years. But if you focus on key areas that matter to you, gaining a practical understanding is entirely doable.

Personally, I’ve got a decent intuition for embeddings, but the key/query/value part was challenging—so I spent more time reviewing vector spaces and inner products. On the other hand, the Fourier transform is complex, but I consider it less critical because PyTorch handles it for you.

Here are some quick tips that worked for me:

Developing a good intuition is more important than knowing every detail.
When learning GenAI (enough to build tools), two key intuitions matter most: dimensionality and levels of abstraction.

In this article, we’ll focus on dimensionality.

Dimensionality in Everyday Life

Remember high school algebra? The x, y, and z axes (1st, 2nd, and 3rd dimensions), and maybe the 4th dimension—popularized by shows like Star Trek and movies like Back to the Future. But beyond that, what do dimensions have to do with everyday life?

Here are two examples:

Example 1: Imagine flat images of disks—some are perfect circles, others are ellipses. In just two dimensions (x and y), it’s hard to interpret them. But now, if we add the third dimension (z-axis) and imagine these disks are saucers rotating in space, the shapes on a flat surface are just shadows. Suddenly, everything becomes clearer. If we need to calculate how they expand and shrink, it's as simple as using sine/cosine equations.

Key takeaway: Adding a dimension often simplifies complex problems.

Example 2: German philosopher Hegel is well known for his concept of dialectics, a method of solving problems by examining opposing viewpoints (thesis and antithesis) and synthesizing them into a new, refined idea. You may have heard of it, though Marx and Nietzsche are more widely read in the U.S., despite Hegel and Kant having had a greater impact on modern science.

Within Hegel’s dialectics, there’s a concept called sublation (Aufhebung), which means both to preserve and to negate at the same time. Hegel uses the growth of a plant to explain this in Phenomenology of Spirit:

"The bud disappears when the blossom breaks through... in the same way, when the fruit comes, the blossom may be explained to be a false form of the plant’s existence, for the fruit appears as the truth in place of the blossom."

In this process:

The bud represents an initial idea.
The blossom negates it (a new stage emerges).
The fruit (sublation) transcends both, preserving and negating the previous stages while forming a higher understanding.

Dimensionality Explained: Sublation can be thought of as "lifting" to a higher dimension, where a system integrates previous stages. For instance, human babies replace over 90% of their cells within a year or two, yet they are still considered the same person. Through this, we comprehend growth and change—similar to how AI models "transcend" different concepts by adding or reducing dimensions as they refine predictions.

Vectors & Matrices in GenAI

Vectors: A vector is a list of numbers, but you can think of it as representing both magnitude and direction in a complex space.

Matrices: A matrix is a collection of vectors that forms a shape in space, which can be transformed by:

领英推荐

How do you select the right machine learning algorithm…

Machine Learning 2 年前

Unraveling the Enigma of VAE

360DigiTMG 1 年前

Machine Learning: What It Is And The Milestones…

Bernard Marr 9 年前

Shifting to a new coordinate system, stretching or rotating.
Expanding to higher dimensions (e.g., from a disk to a sphere) or reducing dimensions (e.g., from a cube to a rectangle).

What Is Word Embedding?

Simply put, embedding converts a word into a series of numbers (a vector). Unlike basic encoding (which turns text into binary), embedding is far more data intensive.

For example, in standard encoding, "artificial intelligence" is represented as a string of numbers based on the ASCII values of its characters. But in embedding, each word is represented by a vector with hundreds or thousands of dimensions, packing much more information. Different models pack embeddings differently, with varying dimensional sizes.

Here are some examples:

OpenAI (text-embedding-ada-002): 1536 dimensions
spaCy (en_core_web_sm): 300 dimensions
HuggingFace BERT (bert-base-uncased): 768 dimensions
LLaMA 3-8B: 1024 dimensions (larger models go up to 4096 dimensions)
Google GEMI: Starts at 2048 dimensions, going up to 16,384 (similar to Microsoft's largest models)
NVIDIA Megatron-LM (530B): 32,768 dimensions

What Is Attention?

The attention mechanism allows models to focus on the most relevant parts of an input sequence. Introduced in the paper Attention Is All You Need, this concept is central to modern transformer models like GPT-3.

The equation is:

Attention(Q, K, V)=softmax(Q(transpose(K)/dk)V

V (Value): Represents the embeddings (word meanings) combined with positional encoding (word order). V is a matrix made up of vectors, where each vector represents a word in the sentence. Embeddings: The model looks up the vector for each word from a pre-trained model (like OpenAI or LLaMA), similar to looking up definitions in a dictionary. Positional Encoding: Since word order matters, the model adjusts these vectors based on the position of each word in the sentence. This ensures the model understands the sequence, not just the individual words.

In the end, you get a matrix that combines the meaning of each word with its position in the sentence.

K (Key Transpose): Represents the "self-attention" process, where the model compares each word with others to assign weights based on relevance. For example, you compare "artificial" with "artificial", "intelligence", "is", "good", and so on. This creates a matrix that shows how each word relates to the others. Certain word combinations get more weight based on how often they naturally occur together. For instance, "artificial" and "intelligence" will have a high score because they make sense together, while "intelligence" and "artificial" might get a lower score.
Q (Query): The context or focus of the attention mechanism, depending on external inputs or tasks.

The softmax function normalizes these comparisons into probabilities, allowing the model to focus more on important words and less on irrelevant ones.

Finally, there’s multi-head attention, which splits the process into multiple “heads,” each focusing on different parts of the input. For example, if your initial embedding size is 512, each head will process 64 dimensions. By combining these perspectives, the model gets a more complete understanding of the input, much like the five blind men describing different parts of an elephant.

Summary of the Process:

The input sentence is tokenized into words.
Each word is converted into a vector (embedding), forming a matrix of word vectors.
The matrix is split across 8 heads (each head processes part of the vectors).

For each head:

The vectors are adjusted based on word positions (this gives you V).
The embeddings are compared to each other (dot product) to create a matrix that captures relationships between words. This matrix is scaled, normalized with softmax, and combined with transformations from Q and K.
This resulting matrix is then used to transform V, producing the head's output.

Finally, all 8 heads are combined and passed through the decoder to generate the final result.

Compared to earlier approaches, this process involves iterations of high-dimensional computations. The higher the dimensions, the more complexity it can handle. For now, large language models (LLMs) speed this up using GPUs, which can efficiently handle matrix operations.

Ultimately, if singularity ever happens, it will likely be in the quantum computing age - when computers can dial up and down to hundreds of thousands or even millions of dimensions.

Conclusion: Dimensionality Isn’t Intimidating – We Use It Daily!

Dimensionality is everywhere—in how we understand growth, relationships, and learning. Just as adding an extra dimension makes complex problems easier to solve, modern AI models use higher dimensions to refine their predictions. AI is constantly integrating information across dimensions to understand our world. And as daunting as these concepts might seem at first glance, we interact with them intuitively in everyday life.

So, while you don’t need to be an expert in Hilbert spaces or Fourier transformations, developing a good intuition about dimensionality can take you a long way in understanding and working with AI.

Stephen Nickel

Ready for the real estate revolution? ?? | AI-driven bargains at your fingertips | Proptech Expert | My Exit with 33 years and the startup comeback. ???????

5 个月

ai can be a maze, but diving into dimensionality empowers understanding. what part of ai sparks your curiosity most? Xiao-Fei Zhang

1 次回应

查看更多评论

要查看或添加评论，请登录

Xiao-Fei Zhang的更多文章

How to Run DeepSeek Locally: Using Hugging Face and Quantization for Efficient Deployment

2025年2月5日

How to Run DeepSeek Locally: Using Hugging Face and Quantization for Efficient Deployment

Introduction Most recent tutorials on implementing DeepSeek locally have used tools like Ollama for quick and easy…

8 条评论
Table Parsing Made Simple with Homegrown Neural Networks - Part 4: Training Pipeline Coding Insights

2025年1月23日

Table Parsing Made Simple with Homegrown Neural Networks - Part 4: Training Pipeline Coding Insights

Introduction Modular code design is the backbone of scalable machine learning pipelines. This article explores: The…
Table Parsing Made Simple with Homegrown Neural Networks - Part 3: Building a Neural Network with Semantic & Positional Features

2025年1月15日

Table Parsing Made Simple with Homegrown Neural Networks - Part 3: Building a Neural Network with Semantic & Positional Features

Quick Recap & Introduction Article 1: Introduced the problem of human-friendly but "machine challenged" Excel tables…

2 条评论
Table Parsing Made Simple with Homegrown Neural Networks - Part 2: Multi-thread Async Preprocessing (Drive Safe and Go Fast)

2025年1月10日

Table Parsing Made Simple with Homegrown Neural Networks - Part 2: Multi-thread Async Preprocessing (Drive Safe and Go Fast)

Introduction: Why Preprocessing Matters As we have already alluded in article 1, turning the raw data into "Machine…

1 条评论
Table Parsing Made Simple with Homegrown Neural Networks (Part 1: Automating Large-Scale Table Processing)

2025年1月8日

Table Parsing Made Simple with Homegrown Neural Networks (Part 1: Automating Large-Scale Table Processing)

The Slings & Arrows of "Outrageous Tables" From pixels to digital tables, machine learning shines, using CNNs…

1 条评论
Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 5: Generate Ideas with Array of Thoughts)

2024年12月26日

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 5: Generate Ideas with Array of Thoughts)

Introduction Part 1 introduced how Pydantic ensures structure and consistency in LLM pipelines, managing cognitive…

2 条评论
Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 4: Streamlining Text Alignment with Multi-Step Processing)

2024年12月23日

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 4: Streamlining Text Alignment with Multi-Step Processing)

In Part 1 of this series, we introduced how Pydantic enforces structure and consistency in LLM pipelines to manage…
Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 3)

2024年12月18日

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 3)

In Part 1 of this series, we explored how Pydantic helps manage cognitive drift in LLM pipelines by enforcing structure…

3 条评论
Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 2)

2024年12月17日

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 2)

Introduction: From Principles to Practice In Part 1, we discussed how Pydantic helps manage cognitive drift in LLM…
Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 1)

2024年12月17日

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 1)

I first heard about Pydantic a couple of years ago on the Talk Python to Me podcast (talkpython.fm) during a hot summer…

3 条评论

See all articles

Quick Intuition for Understanding GenAI by Thinking in Dimensionality

Xiao-Fei Zhang

领英推荐

Xiao-Fei Zhang的更多文章

社区洞察

其他会员也浏览了

All the Math for AI You’ll Need, Simplified

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

Upcoming Books and Articles on MLTechniques.com

DeepSeek-R1: A Machine of Beautiful Grace

What is Artificial Intelligence & Machine Learning ?

How AI learns

DeepMind's Models Get Silver at Math Olympiads

Machine Learning Specialization by DeepLearning.AI & Stanford Online

The Unsung Hero of Data Science: Mathematics

Your Einstein AI in the Basement…..Is YOU!

领英推荐

Xiao-Fei Zhang的更多文章

How to Run DeepSeek Locally: Using Hugging Face and Quantization for Efficient Deployment

Table Parsing Made Simple with Homegrown Neural Networks - Part 4: Training Pipeline Coding Insights

Table Parsing Made Simple with Homegrown Neural Networks - Part 3: Building a Neural Network with Semantic & Positional Features

Table Parsing Made Simple with Homegrown Neural Networks - Part 2: Multi-thread Async Preprocessing (Drive Safe and Go Fast)

Table Parsing Made Simple with Homegrown Neural Networks (Part 1: Automating Large-Scale Table Processing)

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 5: Generate Ideas with Array of Thoughts)

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 4: Streamlining Text Alignment with Multi-Step Processing)

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 3)

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 2)

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 1)

社区洞察

其他会员也浏览了

All the Math for AI You’ll Need, Simplified

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

Upcoming Books and Articles on MLTechniques.com

DeepSeek-R1: A Machine of Beautiful Grace

What is Artificial Intelligence & Machine Learning ?

How AI learns

DeepMind's Models Get Silver at Math Olympiads

Machine Learning Specialization by DeepLearning.AI & Stanford Online

The Unsung Hero of Data Science: Mathematics

Your Einstein AI in the Basement…..Is YOU!