登录查看更多内容

AI Can Now ‘See’ What You’re Thinking About. This Is How Engineers At Meta Made It Possible.

Dr. Ashish Bamania

I simplify the latest advances in AI, Quantum Computing & Software Engineering for you | Tech Writer With 1M+ views | Software Engineer

发布日期: 2024年4月14日

Decoding how the brain works has always been a hot topic of research.

Frustratingly enough, the conventional research models have been quite ineffective and the progress in this field has been quite slow.

But, with the advent of AI and more ‘compute’ available to us all, this field has started progressing nearly exponentially.

A few months ago, researcher engineers at Meta accomplished something significant that is being seen as an important milestone towards completely decoding the inner workings of the human brain.

They trained an AI model that can reconstruct the images perceived and processed by a person’s brain in real time.

In other words, it can recreate images that a person was viewing and thinking about.

This story is a deep dive into how this was made possible.

An Introduction To Brain?Imaging

Brain Imaging or Neuroimaging involves using different techniques to learn how the brain looks and functions.

The advent of these imaging techniques allows us to study the human brain non-invasively (or without actually dissecting one’s body in a lab).

Neuroimaging is divided into two broad categories and these are:

Structural imaging?—?used to study the brain’s structure (examples being CT and MRI scans)
Functional imaging?—?used to study the brain’s function (examples being EEG, MEG and fMRI scans)

To learn about the AI model developed at Meta, we first need to learn about MRI, fMRI and MEG in a bit more detail.

What Is MRI & How It?Works

An MRI or Magnetic Resonance Imaging is an imaging technique that uses strong magnetic fields to visualise the body’s internal structures in extremely high detail.

MRI images of the human brain (GIF by Dwayne Reed at English Wikipedia)

The human body is full of hydrogen atoms (present as a part of water, carbohydrates, proteins and more).

The protons in these hydrogen atoms have an intrinsic property called Spin that makes them behave like tiny magnets.

(Note that Spin has nothing to do with any actual physical spinning.)

Under normal conditions, the net magnetic field produced by all hydrogen atoms in the body is zero.

Here comes the MRI scanner.

An MRI scanner uses magnets as powerful as 3 Teslas or about 50,000 times greater than the Earth’s magnetic field.

When the magnetic field produced by these magnets is applied to a body part, the protons in the hydrogen atoms present there align either parallel or anti-parallel to it.

Next, the MRI scanner applies a radiofrequency pulse to these protons that tilts them away from their alignment with the magnetic field.

When the radiofrequency pulse is turned off, the unaligned protons spiral back to their original alignment.

This spiralling motion induces a current signal that is read by the MRI scanner and reconstructed into an image of the body part’s internal structure using mathematical functions such as Fourier transforms.

Different body parts (e.g., muscles, bones, brain) have different relaxation times and this results in contrasts between these structures in the final image.

What Is fMRI & How It?Works

fMRI stands for Functional Magnetic Resonance Imaging.

It is a special type of MRI scan that helps visualise activity in different parts of one’s brain in real-time.

fMRI is based on the principle that when a brain area becomes active, it consumes more oxygen and thus there is an increase in the supply of oxygen-rich blood to this area.

Because oxygen-rich blood contains Oxyhemoglobin, a diamagnetic compound (repelled by a magnetic field), it changes the magnetic property of this brain area which is then detected by the fMRI machine.

fMRI images showing parts of the brain lighting up on seeing houses and other parts on seeing faces (Source: Image modified from Wikimedia Commons)

fMRIs offer a high Spatial resolution (the ability to differentiate between two points in space), which is typically in the range of 1 to 3 mm.

But, it has a lower Temporal resolution (the smallest unit of time in which changes in brain activity can be reliably detected), and this is typically around 1–4 seconds.

This makes fMRI an unideal choice for studying brain changes in real-time.

Therefore, we need a faster method than this for detecting and recording real-time changes in the brain.

What Is MEG & How It?Works

MEG or Magnetoencephalography is another functional brain imaging technique that directly measures brain activity by interpreting the magnetic fields generated by the electrical currents in brain cells (neurons).

Although it has a spatial resolution ranging from a few mms to about 1 cm, its temporal resolution is around 5000 Hz.

This means that it can measure brain activity 5000 times per second.

Hence, MEG is a great choice for studying brain activity as it unfolds!

Image of the first MEG scan ever obtained (Source: By Sherrykhan78?-?Own work, CC BY-SA 4.0)

Now, that we know how brain imaging works, let’s talk about the AI model that lets researchers visualise what someone else is thinking about.

An Overview Of The AI?Model

While building this model, the objective that the researchers had in mind was to decode the MEG-recorded brain activity of human participants while they were being shown natural images.

To solve this challenge, they created an AI model that consists of 3 core modules —

Image Module: This module works with pre-trained image embeddings created by large deep-learning models like VGG-19 & CLIP. The idea behind this module is to convert large magnitudes of images into a dense representation that the AI model can work with.
Brain Module: This module processes the MEG signals and aims to learn the mapping between these signals and the pre-trained image embeddings.
Generation Module: This is a Diffusion module that takes the latent representations created by the previous modules and generates images from them.

Overview of the AI model (Image by author)

Let’s learn about each of these modules in detail.

领英推荐

The Future of Brain-Computer Interfaces: Advancements,…

Frank Desiere - PhD MBA 6 个月前

The Future Brain | Psychology Today

Cami R. 6 年前

Artificial Intelligence can now detect Heart Failure…

Michael Spencer 5 年前

Image Module

The core idea behind this module is to represent images from large datasets into embeddings.

These image embeddings come from the following sources —

Supervised learning models like VGG-19, that aim to capture high-level visual patterns into image embeddings
Image-text alignment models like CLIP, that aim to capture semantic information related to images into embeddings
Variational Autoencoders (VAEs) that aim to capture the visual features of images in a compressed form
Self-supervised learning Vision Transformers like DINOv1 & DINOv2, that aim to learn visual features of images without the need for labelled data
Human-engineered features that do not use deep learning

Brain Module

The core idea behind this module is to map image embeddings to participants’ brain activity obtained via MEG.

It uses dilated residual Convolutional layers which can process the temporal sequences of MEG brain recordings.

Generation Module

The core idea behind this module is to generate images, that closely resemble the original images shown to the participants, using the predicted embeddings from their brain activity.

It consists of a Latent diffusion model that is conditioned on the latent spaces of the AutoKL Variational Autoencoder and CLIP.

Dataset

The THINGS-MEG dataset consists of MEG recordings from participants as they viewed a wide range of images from the THINGS database.

This dataset was used to train the model.

Training Process

The dataset is first divided into —

Train set (for model training)
Validation set (for hyperparameter tuning)
Test sets (for model evaluation)

The MEG data from the training dataset is pre-processed.

Similarly, the images from the training dataset are converted into embeddings using the Image module.

Next, the Brain module (learning from the MEG data) is trained with two objectives—

To pick the right image out of a bank of candidate images (trained using the CLIP loss)

CLIP Loss (Source: Original research paper)

where:

s is the cosine similarity
zi and ?zi = fθ(Xi) are the latent representation and the corresponding MEG-based prediction, respectively
τ is a learned temperature parameter

2. To generate images (trained using a standard Mean Squared Error (MSE) loss)

MSE loss (Source: Original research paper)

Both of these are combined using a convex combination, for training, as follows.

Combined loss where λ is a hyperparameter that balances the relative importance of both the losses (Source: Original research paper)

The Adam optimizer is used to train the overall model.

Hyperparameter tuning is performed using the Validation set.

Evaluation Process

The performance of the model is evaluated on the Test set using —

Retrieval metrics (such as Relative Median Rank & Top-5 accuracy)?—?to evaluate the probability of identifying the correct image given the model predictions
Generation metrics such as —

PixCorr (Pixel-wise Correlation)?—?measures the pixel-wise correlation between the true and generated images
SSIM (Structural Similarity Index Metric)

How Well Does The Model?Work

When participants’ MEG recordings were taken while they were shown different images, and then fed into the trained model, it could decode many of them very well.

Look at the examples shown below. They are spectacular!

Successful generations-1 (Source: Original research paper)

Successful generations-2 (Source: Original research paper)

But, the model wasn’t right all the time and its failed generations are shown below.

Failed generations-1 (Source: Original research paper)

Failed generations-2 (Source: Original research paper)

The results also showed that the embedding representations created using DINOv2, a self-supervised learning AI model led to the best decoding performance.

This shows how well the visual representations of this self-supervised model align with the actual human brain’s, and that too without any human annotations!

Image retrieval performance of the model on the test datasets (Source: Original research paper)

This is some phenomenal work by Meta researchers!

Although it raises serious ethical questions about the potential threat to our mental privacy, on the positive side, it will help humans better understand the brain and develop futuristic devices that were once only imagined in sci-fi movies.

What are your thoughts on this? Let me know in the comments below!

Join my newsletter mailing list along with hundreds of other curious individuals who read it weekly!

Byte Surgery

1,037 位关注者

要查看或添加评论，请登录

Dr. Ashish Bamania的更多文章

Quantum Computation Is The Fundamental Of Them All

2025年2月26日

Quantum Computation Is The Fundamental Of Them All

Computation has been crucial to human progress since higher-order intelligence emerged. From using bones and sticks to…
Grab A 30% Discount On My Book “Systems Design In 100 Images”

2025年2月16日

Grab A 30% Discount On My Book “Systems Design In 100 Images”

Hey, it’s Ashish here! ?? I’ve got something exciting for you. I am offering a 30% discount on my book, “Systems Design…
‘ReXplain’ Is Transforming Radiology With AI Like Never Before

2025年1月22日

‘ReXplain’ Is Transforming Radiology With AI Like Never Before

Subscribe to ‘Into AI’ — my weekly newsletter where I help you explore Artificial Intelligence from the ground up by…
Is OpenAI’s o1 The AI Doctor We’ve Always Been Waiting For? (Surprisingly, Yes!)

2024年12月4日

Is OpenAI’s o1 The AI Doctor We’ve Always Been Waiting For? (Surprisingly, Yes!)

OpenAI’s o1 is out, and its performance on STEM tasks is mind-bending! Quoted from OpenAI’s research article titled…
‘Transfusion’ Is Supercharging Training Multi-Modal LLMs Like Never Before

2024年10月24日

‘Transfusion’ Is Supercharging Training Multi-Modal LLMs Like Never Before

Multimodal LLMs are gaining popularity. Give them text, images, audio or video, and they will work with it all.
A Human Brain Inspired RAG Approach Has Reached The New State-of-the-Art

2024年10月12日

A Human Brain Inspired RAG Approach Has Reached The New State-of-the-Art

Human brains store immense amounts of knowledge to thrive in their environment. New experiences continuously update…
Google’s New AI Can Hallucinate An Entire Video Game In Real Time Without A Game Engine

2024年10月3日

Google’s New AI Can Hallucinate An Entire Video Game In Real Time Without A Game Engine

Imagine a simulated world run by a powerful AI that generates new scenes and experiences for its inhabitants in real…
‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine

2024年9月26日

‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine

LLMs have transformed how humans search for information for everyday tasks. Although well suited for general scenarios,…

3 条评论
‘SpreadsheetLLM’ Finally Lets LLMs Master Spreadsheets Better Than Ever

2024年9月22日

‘SpreadsheetLLM’ Finally Lets LLMs Master Spreadsheets Better Than Ever

If you’ve ever used an LLM to query spreadsheet data, you would know how tough it is to achieve this. Spreadsheets have…
‘Skeleton Recall Loss’ Is The New Breakthrough In Segmentation

2024年9月7日

‘Skeleton Recall Loss’ Is The New Breakthrough In Segmentation

Precise segmentation is a critical requirement across many domains today. These include training self-driving cars…

2 条评论

See all articles

AI Can Now ‘See’ What You’re Thinking About. This Is How Engineers At Meta Made It Possible.