The brain as an amplifier

The brain as an amplifier

(VERSIóN EN CASTELLANO)

In a well-known streaming platform series, a group of unscrupulous scientists subjects a girl with paranormal powers to a series of cruel experiments in which they immerse her in a sensory deprivation tank to establish telepathic contact with other people. But interestingly, beyond cinematic fiction, the idea is not exactly new.

In 1973, a medical center in the USA received federal funding to conduct formal research on the existence of Extrasensory Perception (ESP). Among the experiments conducted, different variants of what became known as the Ganzfeld experiment (originally designed by German psychologist Wolfgang Metzer) were included.

The general idea consists of reducing external stimuli received by the test subject to the minimum possible. No light, absolute silence, possibly floating in a salt water tank at body temperature... Under these conditions, the brain, so accustomed to finding patterns, tries to "amplify" the slightest variation in the information it receives to continue with that task. Perhaps something equivalent to turning up the TV volume during a moment of silence, or with a very faint sound, trying to identify a conversation: we will probably hear a mix of static and patterns that our brain will try to convert into recognizable words.

We'll leave the interpretation of those experiments' results to the reader's curiosity and opinions, but I would say one thing is clear: In general, subjects were able to identify information in that "background noise."

There is a significant parallel between these types of experiments and the concept behind Generative AI models based on the so-called diffusion technique (DALL-E, Midjourney, or StableDiffusion among others). Simplifying greatly, just as in the Ganzfeld experiment, what we do is ask the machine to generate an image, usually conditioned by a phrase or prompt that we provide, starting from a frame composed exclusively of noise (this part remains hidden from the user).

To do this, we must have previously trained the machine by "showing" it countless images to which we have progressively added digital noise, allowing it to "learn" at each step the relationship between the original and altered versions. Something like seeing known objects through a fogged glass. With some practice, we can perfectly imagine the appearance of an unknown object seen through the same glass.

In each iteration of this training process, the accumulated noise becomes increasingly greater, to the extreme that the original image is no longer distinguishable and we have... just noise. At this point, the machine is capable of constructing something based on its previous experiences (the example images) from the most absolute absence of information.

Now let's imagine for a moment that, instead of images, we apply this same concept to text. Or let's take it a step further: what if we carry out the training process using abstract representations of ideas, concepts, or plans to solve problems? What would this approach add over current technology?

In the case of text generation, latest-generation AIs like ChatGPT, Gemini, Claude, or Llama approach the problem iteratively: one word (more exactly, a word fragment) after another, with the choice of each fragment depending on all previous choices in a sequential process. In other words (if you'll pardon the expression), at the moment of starting to write, the machine doesn't yet know exactly what it's going to say, much less how it will conclude. This is known as an autoregressive model. And in my opinion (which I often defend but is debatable), this is an approach that moves away from the intuition of what could be the basis of a creative process in any living being endowed with minimal problem-solving capacity.

The concept behind diffusion models, on the contrary, seems much more aligned with this intuition: the idea, the plan, gradually materializes from a tiny seed (that light that turns on somewhere) and, perhaps in parts, perhaps completely, ends up being defined with sufficient detail, just like the images we ask our preferred AI to generate. I think this could make sense for programming assistants as well (surely developers don't build their code instruction by instruction, without knowing what they're going to write after each one).

But undoubtedly what seems even more interesting to me is the possibility of constructing reasoning or plans using this approach, because it opens the door to modeling the creative process itself, instead of its effects (like language or images).

Although these ideas are not new (a very relative concept in this sector), they have typically faced various technical difficulties that make it hard to reach the state of the art achieved with more widespread techniques, but recently some publications have emerged with very encouraging results that suggest we could be facing a viable path with some very significant advantages:

https://arxiv.org/abs/2410.21357 (Energy-Based Diffusion Language Models for Text Generation)

https://arxiv.org/abs/2410.14157 (Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning)

If you've made it this far, thank you truly for reading: I hope I haven't provoked an overwhelming urge to flee into a sensory deprivation tank. See you in the next one!

Jorge, al tanque no me has llevado, pero con esa foto me has llevado a una musica de Massive Attack - Teardrop... aqui estoy ahora escuchando a Massive Attack. Un fuerte abrazo, amigo!

要查看或添加评论,请登录

Jorge Nogueira的更多文章

  • Of Calculators and Black Holes

    Of Calculators and Black Holes

    [Spanish version here] Back when the rise of chatbots began, I used to imagine that behind those services—beyond the…

    2 条评论
  • De calculadoras y agujeros negros

    De calculadoras y agujeros negros

    [English version here] Un poco antes del apogeo de los chatbots, solía imaginar que detrás de aquellos servicios, más…

    8 条评论
  • OCR-less Visual Document Retrieval Pipelines

    OCR-less Visual Document Retrieval Pipelines

    When dealing with tables, diagrams, or images, traditional search systems often rely on OCR to extract text. But OCR…

    2 条评论
  • In what language do you dream?

    In what language do you dream?

    Long ago, this question came to mind when I reconnected with a friend who had been immersed in a foreign culture for…

    2 条评论
  • GenAI y propiedad intelectual: Reflexiones

    GenAI y propiedad intelectual: Reflexiones

    Escucho con atención estos últimos días varias voces entusiasmadas con el reciente fallo de un tribunal en EEUU a favor…

  • El cerebro como amplificador

    El cerebro como amplificador

    (ENGLISH VERSION) En una conocida serie de una plataforma de streaming, un grupo de científicos con pocos escrúpulos…

  • On Generative AI, Inflatable Ducks and Human Knowledge

    On Generative AI, Inflatable Ducks and Human Knowledge

    The debate Over the past few months on social networks, I have been keenly following an interesting debate in which…

    2 条评论
  • Random thoughts of an intrapreneur

    Random thoughts of an intrapreneur

    In no particular order, hope you find it useful. Please feel free to send your comments.

    1 条评论

社区洞察

其他会员也浏览了