Can AI Reconstruct What We See?
Parent–Infant EEG enables the measurement of simultaneous neural recordings (Turk et al., 2022).

Can AI Reconstruct What We See?

Introduction

As we navigate through the era of technological advancement, a new frontier is emerging at the confluence of artificial intelligence (AI) and neuroscience. This story explores how AI, the prodigy of modern science, is learning to decode the intricate language of the human brain.

Unraveling the Complexities: The Motivation Behind Advancing Vision Decoding Techniques

  • The brain is a complex organ, and our understanding of how it processes visual information is not yet fully formed.
  • The patterns of neural activity that are associated with different visual stimuli are often subtle and difficult to measure.
  • Vision decoding is a computationally demanding task, and it can be difficult to develop algorithms that can decode visual stimuli accurately and efficiently.

AI in Neuroimaging: An Odyssey from Brain Activity to Visual Perception

The first chapter of this narrative focuses on an extraordinary application of AI in the realm of neuroimaging. Here, AI plays the role of an interpreter, translating the cryptic signals from functional magnetic resonance imaging (fMRI) into discernible images. This pioneering endeavor involves training AI models to create a correlation between the fMRI readings of cerebral blood flow and the images viewed by the human subjects.

What could be the implications of such a breakthrough?

As our visual cortex is known to operate in reverse during dreams, we stand on the precipice of a future where AI could decode and visualize our dreams, offering unprecedented insight into the complex landscape of human cognition.

Vision Decoding: Unraveling the Secrets of Brain Activity

Vision decoding is a fascinating process that extracts meaningful information from brain activity patterns associated with visual stimuli. This innovative process is largely facilitated by advanced neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). The study by Qiao et al., 2019, demonstrated an effective application of fMRI in vision decoding (see Fig. 1), and similarly, Gómez-Hernández et al., 2021, used EEG effectively in their research (see Fig. 2). When these neuroimaging techniques are combined with machine learning algorithms, they culminate in a powerful tool capable of decoding the content of visual experiences and understanding the specific features or aspects of the visual stimulus that the brain is perceiving or representing.

The approaches to vision decoding vary, but they are all anchored to the same fundamental principle: identifying the neural activity patterns associated with different visual stimuli. These distinct patterns of neural activity serve as a unique neural fingerprint for each visual stimulus. By understanding these fingerprints, we can reconstruct the original visual stimulus or infer the meaning of the stimulus.

No alt text provided for this image
Fig 1: Qiao, X., et al. (2019) presents a method for decoding the category of a visual stimulus from human brain activity. The method uses a bidirectional recurrent neural network (BRNN) to simulate the bidirectional information flows in human visual cortices. The BRNN is trained on a dataset of fMRI data from participants who were shown a variety of visual stimuli. The BRNN is then able to decode the category of a new visual stimulus with a high degree of accuracy.


No alt text provided for this image
Fig 2: Example stimuli, design, and EEG setup image. (A) Example images similar to the stimuli used in the experiment. (B) EEG experimental setup (photo credit AKR). (C) Rapid serial visual presentation design, RSVP. For illustration purposes, only part of the sequence is shown. (Gómez-Hernández, J., et al. 2021)

MinD-Vis: A Quantum Leap in Neuroimaging and Vision Decoding

No alt text provided for this image

MinD-Vis, Sparse Masked Brain Modeling with Double-Conditioned Diffusion Model for Vision Decoding, is a groundbreaking approach to vision decoding that employs a generative model to generate images from noise. This innovative AI technique, as detailed in the study by Chen et al., 2023, was trained on a vast collection of brain scans and is capable of producing images with accurate and meaningful details.

No alt text provided for this image
Fig 3 Brain Decoding and Image Reconstruction. For the first time, the AI is capable of decoding fMRI-based brain activities and reconstructing images with not only plausible details but also accurate semantics and image features outperforming previous approaches. Left: Task overview. Middle:with benchmarks. Right: Reconstruction examples.

MinD-Vis Architecture?

No alt text provided for this image
Chen et al (2023) MinD-Vis Architecture

Stage A (left, yellow coloured zone):?This stage involves pre-training on fMRI with Sparse-Coded Masked Brain Modeling (SC-MBM). The fMRI data is divided into patches, randomly masked, and then tokenized into large embeddings. An autoencoder (consisting of an Encoder Masked Brain Model (EMBM) and a Decoder Masked Brain Model (DMBM)) is trained to recover the masked patches.

Stage B (right, blue coloured zone):?This stage involves integration with the Latent Diffusion Model (LDM) through double conditioning. The fMRI latent (LfMRI) is projected through two paths to the LDM conditioning space with a latent dimension projector (PfMRI→Cond). One path connects directly to cross-attention heads in the LDM. Another path adds the fMRI latent to time embeddings. The LDM operates on a low-dimensional, compressed version of the original image (i.e., image latent). However, the original image is used in this figure for illustration purposes.

Final Thoughts: Can AI see what we see?

At present, AI can interpret the patterns recorded by fMRI scans which track blood movement in the brain. However, it's important to clarify that AI doesn't 'see' in the way humans do. What it does instead, is use these fMRI data to reconstruct the visual stimuli we are perceiving. So, while AI doesn't see in the traditional sense, it is becoming increasingly proficient in interpreting our neural responses to visual stimuli.

Glossary

fMRI?stands for functional magnetic resonance imaging. It is a non-invasive technique that uses magnetic fields and radio waves to measure changes in blood flow in the brain.

ROI Region?of interest?

ResNet-50?is a convolutional neural network (CNN) that was introduced by He et al. in the paper “Deep Residual Learning for Image Recognition” (2015). ResNet-50 is a deep CNN, with 50 layers, and it is one of the most popular CNNs for image recognition.

References

  • Chen, Y., Wang, Y., Zhu, Z., Fang, C., & Zhang, L. (2023). Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. arXiv preprint arXiv:2211.06956.
  • Gómez-Hernández, J., et al. “Decoding visual object categories from human brain activity.” Nature Neuroscience 24.5 (2021): 748–756.
  • Qiao, X., et al. (2019). Category decoding of visual stimuli from human brain activity using a bidirectional recurrent neural network to simulate bidirectional information flows in human visual cortices. Frontiers in Neuroscience, 13, 692. doi:10.3389/fnins.2019.00692
  • Turk, Elise and Endevelt-Shapira, Yaara and Feldman, Ruth and van den Heuvel, Marion I. and Levy, Jonathan (2022) Brains in Sync: Practical Guideline for Parent–Infant EEG During Natural Interaction. Frontiers in Psychology, 13 doi:=10.3389/fpsyg.2022.833112

Pepe Ruiz

Attended FernUniversit?t in Hagen

1 年

hola Mario, hay repositorio del código?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了