AI Can Now ‘See’ What You’re Thinking About. This Is How Engineers At Meta Made It Possible.
Dr. Ashish Bamania
I simplify the latest advances in AI, Quantum Computing & Software Engineering for you | Tech Writer With 1M+ views | Software Engineer
Decoding how the brain works has always been a hot topic of research.
Frustratingly enough, the conventional research models have been quite ineffective and the progress in this field has been quite slow.
But, with the advent of AI and more ‘compute’ available to us all, this field has started progressing nearly exponentially.
A few months ago, researcher engineers at Meta accomplished something significant that is being seen as an important milestone towards completely decoding the inner workings of the human brain.
They trained an AI model that can reconstruct the images perceived and processed by a person’s brain in real time.
In other words, it can recreate images that a person was viewing and thinking about.
This story is a deep dive into how this was made possible.
An Introduction To Brain?Imaging
Brain Imaging or Neuroimaging involves using different techniques to learn how the brain looks and functions.
The advent of these imaging techniques allows us to study the human brain non-invasively (or without actually dissecting one’s body in a lab).
Neuroimaging is divided into two broad categories and these are:
To learn about the AI model developed at Meta, we first need to learn about MRI, fMRI and MEG in a bit more detail.
What Is MRI & How It?Works
An MRI or Magnetic Resonance Imaging is an imaging technique that uses strong magnetic fields to visualise the body’s internal structures in extremely high detail.
The human body is full of hydrogen atoms (present as a part of water, carbohydrates, proteins and more).
The protons in these hydrogen atoms have an intrinsic property called Spin that makes them behave like tiny magnets.
(Note that Spin has nothing to do with any actual physical spinning.)
Under normal conditions, the net magnetic field produced by all hydrogen atoms in the body is zero.
Here comes the MRI scanner.
An MRI scanner uses magnets as powerful as 3 Teslas or about 50,000 times greater than the Earth’s magnetic field.
When the magnetic field produced by these magnets is applied to a body part, the protons in the hydrogen atoms present there align either parallel or anti-parallel to it.
Next, the MRI scanner applies a radiofrequency pulse to these protons that tilts them away from their alignment with the magnetic field.
When the radiofrequency pulse is turned off, the unaligned protons spiral back to their original alignment.
This spiralling motion induces a current signal that is read by the MRI scanner and reconstructed into an image of the body part’s internal structure using mathematical functions such as Fourier transforms.
Different body parts (e.g., muscles, bones, brain) have different relaxation times and this results in contrasts between these structures in the final image.
What Is fMRI & How It?Works
fMRI stands for Functional Magnetic Resonance Imaging.
It is a special type of MRI scan that helps visualise activity in different parts of one’s brain in real-time.
fMRI is based on the principle that when a brain area becomes active, it consumes more oxygen and thus there is an increase in the supply of oxygen-rich blood to this area.
Because oxygen-rich blood contains Oxyhemoglobin, a diamagnetic compound (repelled by a magnetic field), it changes the magnetic property of this brain area which is then detected by the fMRI machine.
fMRIs offer a high Spatial resolution (the ability to differentiate between two points in space), which is typically in the range of 1 to 3 mm.
But, it has a lower Temporal resolution (the smallest unit of time in which changes in brain activity can be reliably detected), and this is typically around 1–4 seconds.
This makes fMRI an unideal choice for studying brain changes in real-time.
Therefore, we need a faster method than this for detecting and recording real-time changes in the brain.
What Is MEG & How It?Works
MEG or Magnetoencephalography is another functional brain imaging technique that directly measures brain activity by interpreting the magnetic fields generated by the electrical currents in brain cells (neurons).
Although it has a spatial resolution ranging from a few mms to about 1 cm, its temporal resolution is around 5000 Hz.
This means that it can measure brain activity 5000 times per second.
Hence, MEG is a great choice for studying brain activity as it unfolds!
Now, that we know how brain imaging works, let’s talk about the AI model that lets researchers visualise what someone else is thinking about.
An Overview Of The AI?Model
While building this model, the objective that the researchers had in mind was to decode the MEG-recorded brain activity of human participants while they were being shown natural images.
To solve this challenge, they created an AI model that consists of 3 core modules —
Let’s learn about each of these modules in detail.
领英推荐
Image Module
The core idea behind this module is to represent images from large datasets into embeddings.
These image embeddings come from the following sources —
Brain Module
The core idea behind this module is to map image embeddings to participants’ brain activity obtained via MEG.
It uses dilated residual Convolutional layers which can process the temporal sequences of MEG brain recordings.
Generation Module
The core idea behind this module is to generate images, that closely resemble the original images shown to the participants, using the predicted embeddings from their brain activity.
It consists of a Latent diffusion model that is conditioned on the latent spaces of the AutoKL Variational Autoencoder and CLIP.
Dataset
The THINGS-MEG dataset consists of MEG recordings from participants as they viewed a wide range of images from the THINGS database.
This dataset was used to train the model.
Training Process
The dataset is first divided into —
The MEG data from the training dataset is pre-processed.
Similarly, the images from the training dataset are converted into embeddings using the Image module.
Next, the Brain module (learning from the MEG data) is trained with two objectives—
where:
2. To generate images (trained using a standard Mean Squared Error (MSE) loss)
Both of these are combined using a convex combination, for training, as follows.
The Adam optimizer is used to train the overall model.
Hyperparameter tuning is performed using the Validation set.
Evaluation Process
The performance of the model is evaluated on the Test set using —
How Well Does The Model?Work
When participants’ MEG recordings were taken while they were shown different images, and then fed into the trained model, it could decode many of them very well.
Look at the examples shown below. They are spectacular!
But, the model wasn’t right all the time and its failed generations are shown below.
The results also showed that the embedding representations created using DINOv2, a self-supervised learning AI model led to the best decoding performance.
This shows how well the visual representations of this self-supervised model align with the actual human brain’s, and that too without any human annotations!
This is some phenomenal work by Meta researchers!
Although it raises serious ethical questions about the potential threat to our mental privacy, on the positive side, it will help humans better understand the brain and develop futuristic devices that were once only imagined in sci-fi movies.
What are your thoughts on this? Let me know in the comments below!