AI Can Now ‘See’ What You’re Thinking About. This Is How Engineers At Meta Made It Possible.
Generated with DALL-E 3

AI Can Now ‘See’ What You’re Thinking About. This Is How Engineers At Meta Made It Possible.

Decoding how the brain works has always been a hot topic of research.

Frustratingly enough, the conventional research models have been quite ineffective and the progress in this field has been quite slow.

But, with the advent of AI and more ‘compute’ available to us all, this field has started progressing nearly exponentially.

A few months ago, researcher engineers at Meta accomplished something significant that is being seen as an important milestone towards completely decoding the inner workings of the human brain.

They trained an AI model that can reconstruct the images perceived and processed by a person’s brain in real time.

In other words, it can recreate images that a person was viewing and thinking about.

This story is a deep dive into how this was made possible.


An Introduction To Brain?Imaging

Brain Imaging or Neuroimaging involves using different techniques to learn how the brain looks and functions.

The advent of these imaging techniques allows us to study the human brain non-invasively (or without actually dissecting one’s body in a lab).

Neuroimaging is divided into two broad categories and these are:

  • Structural imaging?—?used to study the brain’s structure (examples being CT and MRI scans)
  • Functional imaging?—?used to study the brain’s function (examples being EEG, MEG and fMRI scans)

To learn about the AI model developed at Meta, we first need to learn about MRI, fMRI and MEG in a bit more detail.


What Is MRI & How It?Works

An MRI or Magnetic Resonance Imaging is an imaging technique that uses strong magnetic fields to visualise the body’s internal structures in extremely high detail.

MRI images of the human brain (GIF by Dwayne Reed at English Wikipedia)

The human body is full of hydrogen atoms (present as a part of water, carbohydrates, proteins and more).

The protons in these hydrogen atoms have an intrinsic property called Spin that makes them behave like tiny magnets.

(Note that Spin has nothing to do with any actual physical spinning.)

Under normal conditions, the net magnetic field produced by all hydrogen atoms in the body is zero.

Here comes the MRI scanner.

An MRI scanner uses magnets as powerful as 3 Teslas or about 50,000 times greater than the Earth’s magnetic field.

When the magnetic field produced by these magnets is applied to a body part, the protons in the hydrogen atoms present there align either parallel or anti-parallel to it.

Next, the MRI scanner applies a radiofrequency pulse to these protons that tilts them away from their alignment with the magnetic field.

When the radiofrequency pulse is turned off, the unaligned protons spiral back to their original alignment.

This spiralling motion induces a current signal that is read by the MRI scanner and reconstructed into an image of the body part’s internal structure using mathematical functions such as Fourier transforms.

Different body parts (e.g., muscles, bones, brain) have different relaxation times and this results in contrasts between these structures in the final image.


What Is fMRI & How It?Works

fMRI stands for Functional Magnetic Resonance Imaging.

It is a special type of MRI scan that helps visualise activity in different parts of one’s brain in real-time.

fMRI is based on the principle that when a brain area becomes active, it consumes more oxygen and thus there is an increase in the supply of oxygen-rich blood to this area.

Because oxygen-rich blood contains Oxyhemoglobin, a diamagnetic compound (repelled by a magnetic field), it changes the magnetic property of this brain area which is then detected by the fMRI machine.

fMRI images showing parts of the brain lighting up on seeing houses and other parts on seeing faces (Source: Image modified from Wikimedia Commons)

fMRIs offer a high Spatial resolution (the ability to differentiate between two points in space), which is typically in the range of 1 to 3 mm.

But, it has a lower Temporal resolution (the smallest unit of time in which changes in brain activity can be reliably detected), and this is typically around 1–4 seconds.

This makes fMRI an unideal choice for studying brain changes in real-time.

Therefore, we need a faster method than this for detecting and recording real-time changes in the brain.


What Is MEG & How It?Works

MEG or Magnetoencephalography is another functional brain imaging technique that directly measures brain activity by interpreting the magnetic fields generated by the electrical currents in brain cells (neurons).

Although it has a spatial resolution ranging from a few mms to about 1 cm, its temporal resolution is around 5000 Hz.

This means that it can measure brain activity 5000 times per second.

Hence, MEG is a great choice for studying brain activity as it unfolds!

Image of the first MEG scan ever obtained (Source: By Sherrykhan78?-?Own work, CC BY-SA 4.0)

Now, that we know how brain imaging works, let’s talk about the AI model that lets researchers visualise what someone else is thinking about.


An Overview Of The AI?Model

While building this model, the objective that the researchers had in mind was to decode the MEG-recorded brain activity of human participants while they were being shown natural images.

To solve this challenge, they created an AI model that consists of 3 core modules —

  1. Image Module: This module works with pre-trained image embeddings created by large deep-learning models like VGG-19 & CLIP. The idea behind this module is to convert large magnitudes of images into a dense representation that the AI model can work with.
  2. Brain Module: This module processes the MEG signals and aims to learn the mapping between these signals and the pre-trained image embeddings.
  3. Generation Module: This is a Diffusion module that takes the latent representations created by the previous modules and generates images from them.

Overview of the AI model (Image by author)

Let’s learn about each of these modules in detail.


Image Module

The core idea behind this module is to represent images from large datasets into embeddings.

These image embeddings come from the following sources —

  • Supervised learning models like VGG-19, that aim to capture high-level visual patterns into image embeddings
  • Image-text alignment models like CLIP, that aim to capture semantic information related to images into embeddings
  • Variational Autoencoders (VAEs) that aim to capture the visual features of images in a compressed form
  • Self-supervised learning Vision Transformers like DINOv1 & DINOv2, that aim to learn visual features of images without the need for labelled data
  • Human-engineered features that do not use deep learning

Brain Module

The core idea behind this module is to map image embeddings to participants’ brain activity obtained via MEG.

It uses dilated residual Convolutional layers which can process the temporal sequences of MEG brain recordings.

Generation Module

The core idea behind this module is to generate images, that closely resemble the original images shown to the participants, using the predicted embeddings from their brain activity.

It consists of a Latent diffusion model that is conditioned on the latent spaces of the AutoKL Variational Autoencoder and CLIP.


Dataset

The THINGS-MEG dataset consists of MEG recordings from participants as they viewed a wide range of images from the THINGS database.

This dataset was used to train the model.


Training Process

The dataset is first divided into —

  • Train set (for model training)
  • Validation set (for hyperparameter tuning)
  • Test sets (for model evaluation)

The MEG data from the training dataset is pre-processed.

Similarly, the images from the training dataset are converted into embeddings using the Image module.

Next, the Brain module (learning from the MEG data) is trained with two objectives—

  1. To pick the right image out of a bank of candidate images (trained using the CLIP loss)

CLIP Loss (Source: Original research paper)

where:

  • s is the cosine similarity
  • zi and ?zi = fθ(Xi) are the latent representation and the corresponding MEG-based prediction, respectively
  • τ is a learned temperature parameter

2. To generate images (trained using a standard Mean Squared Error (MSE) loss)

MSE loss (Source: Original research paper)

Both of these are combined using a convex combination, for training, as follows.

Combined loss where λ is a hyperparameter that balances the relative importance of both the losses (Source: Original research paper)

The Adam optimizer is used to train the overall model.

Hyperparameter tuning is performed using the Validation set.


Evaluation Process

The performance of the model is evaluated on the Test set using —

  1. Retrieval metrics (such as Relative Median Rank & Top-5 accuracy)?—?to evaluate the probability of identifying the correct image given the model predictions
  2. Generation metrics such as —


How Well Does The Model?Work

When participants’ MEG recordings were taken while they were shown different images, and then fed into the trained model, it could decode many of them very well.

Look at the examples shown below. They are spectacular!

Successful generations-1 (Source: Original research paper)
Successful generations-2 (Source: Original research paper)

But, the model wasn’t right all the time and its failed generations are shown below.

Failed generations-1 (Source: Original research paper)
Failed generations-2 (Source: Original research paper)

The results also showed that the embedding representations created using DINOv2, a self-supervised learning AI model led to the best decoding performance.

This shows how well the visual representations of this self-supervised model align with the actual human brain’s, and that too without any human annotations!

Image retrieval performance of the model on the test datasets (Source: Original research paper)

This is some phenomenal work by Meta researchers!

Although it raises serious ethical questions about the potential threat to our mental privacy, on the positive side, it will help humans better understand the brain and develop futuristic devices that were once only imagined in sci-fi movies.

What are your thoughts on this? Let me know in the comments below!


Join my newsletter mailing list along with hundreds of other curious individuals who read it weekly!

要查看或添加评论,请登录

Dr. Ashish Bamania的更多文章

社区洞察

其他会员也浏览了