Can AI Reconstruct What We See?
Dr. Mario Javier Pérez Rivas
Director of AI and Cloud Infrastructure Services
Introduction
As we navigate through the era of technological advancement, a new frontier is emerging at the confluence of artificial intelligence (AI) and neuroscience. This story explores how AI, the prodigy of modern science, is learning to decode the intricate language of the human brain.
Unraveling the Complexities: The Motivation Behind Advancing Vision Decoding Techniques
AI in Neuroimaging: An Odyssey from Brain Activity to Visual Perception
The first chapter of this narrative focuses on an extraordinary application of AI in the realm of neuroimaging. Here, AI plays the role of an interpreter, translating the cryptic signals from functional magnetic resonance imaging (fMRI) into discernible images. This pioneering endeavor involves training AI models to create a correlation between the fMRI readings of cerebral blood flow and the images viewed by the human subjects.
What could be the implications of such a breakthrough?
As our visual cortex is known to operate in reverse during dreams, we stand on the precipice of a future where AI could decode and visualize our dreams, offering unprecedented insight into the complex landscape of human cognition.
Vision Decoding: Unraveling the Secrets of Brain Activity
Vision decoding is a fascinating process that extracts meaningful information from brain activity patterns associated with visual stimuli. This innovative process is largely facilitated by advanced neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). The study by Qiao et al., 2019, demonstrated an effective application of fMRI in vision decoding (see Fig. 1), and similarly, Gómez-Hernández et al., 2021, used EEG effectively in their research (see Fig. 2). When these neuroimaging techniques are combined with machine learning algorithms, they culminate in a powerful tool capable of decoding the content of visual experiences and understanding the specific features or aspects of the visual stimulus that the brain is perceiving or representing.
The approaches to vision decoding vary, but they are all anchored to the same fundamental principle: identifying the neural activity patterns associated with different visual stimuli. These distinct patterns of neural activity serve as a unique neural fingerprint for each visual stimulus. By understanding these fingerprints, we can reconstruct the original visual stimulus or infer the meaning of the stimulus.
领英推荐
MinD-Vis: A Quantum Leap in Neuroimaging and Vision Decoding
MinD-Vis, Sparse Masked Brain Modeling with Double-Conditioned Diffusion Model for Vision Decoding, is a groundbreaking approach to vision decoding that employs a generative model to generate images from noise. This innovative AI technique, as detailed in the study by Chen et al., 2023, was trained on a vast collection of brain scans and is capable of producing images with accurate and meaningful details.
MinD-Vis Architecture?
Stage A (left, yellow coloured zone):?This stage involves pre-training on fMRI with Sparse-Coded Masked Brain Modeling (SC-MBM). The fMRI data is divided into patches, randomly masked, and then tokenized into large embeddings. An autoencoder (consisting of an Encoder Masked Brain Model (EMBM) and a Decoder Masked Brain Model (DMBM)) is trained to recover the masked patches.
Stage B (right, blue coloured zone):?This stage involves integration with the Latent Diffusion Model (LDM) through double conditioning. The fMRI latent (LfMRI) is projected through two paths to the LDM conditioning space with a latent dimension projector (PfMRI→Cond). One path connects directly to cross-attention heads in the LDM. Another path adds the fMRI latent to time embeddings. The LDM operates on a low-dimensional, compressed version of the original image (i.e., image latent). However, the original image is used in this figure for illustration purposes.
Final Thoughts: Can AI see what we see?
At present, AI can interpret the patterns recorded by fMRI scans which track blood movement in the brain. However, it's important to clarify that AI doesn't 'see' in the way humans do. What it does instead, is use these fMRI data to reconstruct the visual stimuli we are perceiving. So, while AI doesn't see in the traditional sense, it is becoming increasingly proficient in interpreting our neural responses to visual stimuli.
Glossary
fMRI?stands for functional magnetic resonance imaging. It is a non-invasive technique that uses magnetic fields and radio waves to measure changes in blood flow in the brain.
ROI Region?of interest?
ResNet-50?is a convolutional neural network (CNN) that was introduced by He et al. in the paper “Deep Residual Learning for Image Recognition” (2015). ResNet-50 is a deep CNN, with 50 layers, and it is one of the most popular CNNs for image recognition.
References
Attended FernUniversit?t in Hagen
1 年hola Mario, hay repositorio del código?