What a 2016 viral phenomenon can tell us about creativity and LLMs

What a 2016 viral phenomenon can tell us about creativity and LLMs

What do the 2016 viral YouTube song "PPAP" by the Japanese artist Pikotaro and Arthur Koestler's tome on creativity, The Act of Creation, have in common? Perhaps nothing. But in my head, they apparently do, because almost as soon as I started reading Koestler's book, I was reminded of the weirdness that is this 45-second piece (the shortest song at the time to hit the Billboard Hot 100), the title short for "Pen-Pineapple-Apple-Pen." Did I say weird? Then I got duly sucked into a rabbit hole of replays and research about its origins and creator (an embarrassing number and an embarrassing amount, respectively). But once I emerged—daze lifted—I could see the link. The song was a perfect metaphor for what Koestler calls the essence of creativity: the interaction of seemingly incompatible contexts or frames of reference. To justify my claim, however, I must attempt to explain the song's premise (sorry to subject you to this; I promise there's a payoff). In the video, we first see Pikotaro singing about and miming the smushing together of a pen and an apple to create the "novel" artifact Apple-Pen. Then one more: Pineapple-pen. And then the coup de grace: Pen-Pineapple-Apple-Pen. All set to suitably weird techno music. (I can't resist sharing the origin story, too. Kazuhito Kosaka, aka Pikotaro, was sitting with his pen when it occurred to him that he is from the Aomori prefecture, famous for its apples; he also noticed a can of pineapple slices on the table. And then, literally, the song was born.)

The on-the-nose quality of Pikotaro's antics had aided my almost instant recall; the physicality seemed to evoke metaphor itself: how we use situations or ideas closer to our physical experience to make sense of more abstract notions. There's another layer, too. Koestler's description of the mechanism of laughter fits this situation perfectly. He notes that the contexts or frames do not so much meet as collide in humor; it is when the narrative of one frame, following one set of rules, is violently upended by another that follows different rules. In Pikotaro's case, as he brings together a pen and an apple, there's some mild suspense, and we expect it to lead to at least a somewhat exalted conclusion, which is abruptly shattered by the literality of it all (Apple-Pen? Seriously?). Applying Koestler's concept, we laugh because the suspense, now redundant, is "flushed out in laughter." But what truly caught my imagination is Koestler's hypothesis (supported by copious evidence) that this basic pattern—some form of reckoning between at least two narratives operating according to different rules—holds across all domains of creativity. In other words, the dynamic at play in Pikotaro's song is a tiny window into the mechanism of all creativity. Specifically, he hypothesizes that anything creative involves not one but at least two frames that either collide for humorous effect, fuse as in discovery or science, or confront and contrast as in art—all of these instances of what Koestler calls bisociation.

Koestler's ideas resonate powerfully in our present age of generative—remarkably prescient for a book published in 1964. Before the advent of this new class of AI tools like ChatGPT, creative jobs or tasks were considered safe from automation. Today, such assumptions are being strongly challenged. However, this contention—that human creativity's role is either already diminished or will be so imminently—often stems from a lack of clarity about what creativity means or a manufactured conflict over its interpretations. Such is the nature of lofty ideas; consider others like beauty, happiness, success, intelligence, consciousness, qualia, and many more that elude easy definitions and stoke passionate debates. As Mel Rhodes points out in a much-cited paper, the multi-faceted nature of creativity and similarly abstract concepts means different people might emphasize different aspects, leading some to be overlooked. Making matters worse, the desire for consensus often results in settling on a lowest-common-denominator dictionary definition that centers on the output (creativity is producing a novel idea or product), which is both simplistic and circular. Here's where Koestler's ideas shine; instead of concentrating on the output, he focuses on what precedes it. Viewed through Rhodes' more holistic lens on creativity that identifies four facets—person, process, press, and product—the first three P's, usually given short shrift, receive the most attention in Koestler's approach.

To get a little more concrete on what holistic means, let me introduce a criticism by the theoretical physicist David Deutsch, who is also a prominent voice on artificial creativity, regarding what many consider the gold standard for testing machine intelligence: the Turing test. The criterion for passing is that a human judge cannot reliably distinguish between the responses of an AI and those of a human (with creativity being a critical component). But Deutsch contends that this puts the emphasis in the wrong place and that the roots of creativity lie in the machine's specification. To see how a behavioral lens might lead us astray, consider that a mundane output could be produced through means that have none of the trappings (inputs, methods, etc.) of the traditional way to create it—a novel means to a routine end can be creative, too. Koestler understood this well; his 700+ page treatment of this topic testifies to his expansive conceptualization. In this essay, I aim to distill some of his ideas, examine them in light of state-of-the-art AI, and convince you that they provide novel insights into where we are and might be headed. It is poetic that a book written only eight or so years after the Dartmouth workshop organized by John McCarthy, which inaugurated AI, might have something significant to say as the field finds itself, according to many, at a crossroads more than half a century later.

An excellent place to start our exploration is with a companion concept to bisociation central to Koestler's thesis, the trivalence of all creative patterns. As I pondered the idea, a fitting example happily popped into my head: the ninth episode of season four of the TV show Curb Your Enthusiasm, "The Survivor." But first, as a little warm-up, here is a cartoon by the long-time New Yorker cartoonist Bob Mankoff that clearly illustrates what Koestler means by rules when he talks about them conflicting within a bisociative event.


Politeness and rudeness collide in classic New Yorker fashion.

Mankoff, himself a fan of Koestler's work, describes why this joke (the most reprinted by the New Yorker) works in terms that Koestler would surely approve of: it mixes "the syntax of politeness and the message being rude." It exemplifies how behaviors that operate according to opposite rules can be brought together in what Mankoff calls a "cognitive synergy" for comic effect. Now, for some Curb.

In the episode, Larry David's father's friend Solly, a Holocaust survivor, is at a social event and is seen eagerly scanning the room for a fellow survivor Larry had promised would be present. To his dismay, Larry found out moments earlier that the guest who'd promised to bring along a survivor had meant "survivor," a contestant and runner-up of the Australian version of the TV show Survivor. In the first of many collisions Larry orchestrates, that guest is a Rabbi on whom the irony of producing the tall Australian bloke as a survivor is wholly lost. Larry reluctantly breaks the news to Solly, whose face turns from joyful anticipation to some mixture of confusion and disappointment. The denouement is a dinner confrontation where the "survivor" and Solly get into a game of "who had it worse." At one point, the faux survivor, in all seriousness, complains about his snack situation during his ordeal. Solly snaps back, noting that they often had to go without food for weeks at the camp. From there, the verbal volleys only get quicker, and the whole thing concludes with a delightful punchline.

The jokes in the Curb segment mainly derive from the word pun on "survivor," a single form that, depending on the context, has two entirely different emotional colorings and associated gravitas. The classic clash between the trivial and the profound sustains the laughs. What is also notable is the element that Koestler considers an essential ingredient in all humor, which he calls the self-assertive tendency. When we watch, there's undeniably an undertone of disdain we detect in our perception of the interloper; we might think, look at that guy! I will never be him. Subtler, though, is the property of trivalence that I noted earlier. In defining it, Koestler says any creative activity, regardless of the domain, "can enter the service of humor, discovery, or art." Applied to the situation, if we look slightly askance at it, we might get interested in the man from the outback's pathological self-absorption and narcissism in a spirit of discovery. Or, by shifting our gaze a little further, we might be inspired to create a piece of art or writing on privilege, a lament of sorts. Given that my main focus in this essay will be discovery, here's a characteristically epigrammatic line from the book that captures the shared logic between humor and discovery, and more generally, among all creative patterns: "As we cross the fluid boundary [from humor to discovery], the task of 'seeing the joke' becomes the task of 'solving the problem.'"

Although arguments about whether something is creative can become quite heated, we can all agree that there are slam-dunk cases. There's no creativity in brushing one's teeth (unless that someone is your housecat that woke up one fine morning and chose violence against received wisdom). At the other end, Albert Einstein's famed Gedankenexperiment, in which he imagines riding alongside a beam of light, most certainly is. (This thought experiment led to the ideas of the relativity of time and the development of the Special Theory of Relativity.) Somewhere between the two is a task like making a cup of cappuccino; most would argue here that it doesn't involve creativity. But how about cappuccino with latte art? Reasonable people might disagree about the creative merits of this one. Koestler's package of concepts provides a way to navigate more assuredly through the twilight region where banal shades into the brilliant, more than mere subjective opinions can.

Here's one allied to the idea of a frame of reference but more suited to discussing discoveries: matrix. It is Koestler's preferred term for thought patterns that are more or less flexible but nevertheless controlled by a fixed set of rules or a code. For instance, when we are about to perform a routine task, we call up (without thinking) the relevant code that maps out the skills needed to accomplish it. The more practiced we are in the skills, the less deliberation it involves. The classic example is riding a bicycle (except if you are Phoebe Buffay); the code here is what Koestler calls hidden persuaders, more commonly referred to as tacit knowledge because it is mainly exercised below the level of consciousness. In fact, conscious thought positively impedes performance. A trick I use when I forget where I placed a file on my computer is to imagine organizing a similar file and letting the hidden persuaders guide my motions—it has a high hit rate.

The idea of a matrix helps create the right mental image to talk about bisociation in a problem-solving context. We start with the definition—a matrix is a repertoire of skills manifesting the underlying rules—which should readily evoke the sense that there is a span or scope to it: regions that are accessible and regions that are not. This naturally leads to a problem when our current best approach falls short of finding a viable path to the goal. To put it differently, if we imagine the matrix to be a plane, the goal is off-plane, and we have, in the language of Koestler, a "blocked matrix" on our hands. If we are fortunate, our search for a solution leads to an auxiliary matrix materializing and fusing with the original matrix to unblock it, thus revealing a path to the solution. Inspired by Koestler's sketches, I have tried to reproduce this idea in the following graphic.


Blocked matrix (left), auxiliary matrix to the rescue (right).

Another term—though not something Koestler uses—that captures the essence of what we do is representation. Especially when presented with a problem that requires conscious effort, like a puzzle, we look for a suitable representation that charts a path to the goal. The representation could be an elemental matrix or sometimes a known ensemble of matrices (for the latter, imagine a prefabricated matrix fused at some prior point from simpler matrices). Incidentally, the idea of a prefabricated matrix clarifies the dynamics behind something we often experience: yesterday's a-ha becoming today's duh; the problem's structure suggests a recipe that brings together those simpler matrices so frequently that combining them becomes common knowledge. More importantly, what also emerges is that new-to-the-world discoveries did not have the luxury of pressing into service a preexisting matrix. They all involved searching for an auxiliary matrix and the occurrence of the bisociative event that fused it with the original matrix for the first time.

All of this is abstract. Let me present some examples: two puzzles and a canonical new-to-the-world discovery.

A brief profile of the Fields Medal-winning mathematician June Huh appeared in The New York Times when he won the honor, describing a chess puzzle that had him "flailing" for over a week when he encountered it in middle school. The challenge is to exchange the white and black knights placed on an abridged chessboard with ten squares. You have to go from the starting position in the picture on the left below to the goal position on the right.


Chess puzzle: Start position (left), goal position (right).

A clue to the puzzle's difficulty, if we attempt to solve it using its "native" representation—simply following the rules of chess adapted to the mutilated chessboard and brute-forcing the sequence—is that it takes 52 moves to reach the goal. Wicked hard. But—and this was Huh's a-ha—if we use an alternative graph representation where the nodes are the numbered squares (from top to bottom and left to right) with edges connecting nodes reachable from one another (given how the knight moves), the problem becomes almost mechanical. Look at the corresponding graph representation below. It suggests a nearly muscular approach where the knights are jockeying for position.


Chess puzzle: Graph representation that unlocks your intuition.

Concretely, this alternative way of seeing the problem reduces a welter of options into a crucial abstraction that unlocks it: seeing square nine as a kind of cubbyhole a knight could step back into to let a fellow knight cross over. With this, you should be able to trot out the solution in short order. I was inspired enough when I read this to create a short animation of the steps (visualizing the board along with its graph representation), which I present below.

In the case of this puzzle, the term "blocked" applies only loosely because nothing stops us from persevering in the first frame and taking the long road to the solution. Huh uses this fact to underscore the subjective nature of the "right" way to see a problem, noting, "The two formulations are logically indistinguishable, but our intuition works in only one of them."

Let me present another example that continues on the theme of subjectivity but brings out a slightly different side to it and, more importantly, serves as a nice transition from logical puzzles and everyday problem-solving to ones that do nothing less than reshape how we see the world.

For this puzzle, I will use the mathematician Steven Strogatz's clear and concise description verbatim—I'll comment on the context in which it was presented later.

Two bicyclists start at opposite ends of a road 20 miles long. Each cyclist travels toward the other at 10 miles per hour. When they begin, a fly sitting on the front wheel of one of the bikes takes off and races at 15 miles per hour toward the other bike. As soon as it gets there, it instantly turns around and zips back toward the first bike, then back to the second, and so on. It keeps flying back and forth until it's finally squished between their front tires when the bikes collide. How far did the fly travel, in total, before it was squished?

If you're like me, you probably gravitated towards taking a fly-eye view of the problem since the bicyclists are not doing anything interesting, and it is unnatural to assume the perspective of the inanimate objects, the bicycles, here. In this frame, you'd start by calculating the initial distance traversed by the fly until it meets the oncoming bicycle as 12 miles (ratio of the speeds = 10:15 = 2:3; therefore, the fly would cover three-fifths of the distance and the oncoming bicycle one-fifth, and 3/5 * 20 = 12). If you continue with the same logic, the next leg would require traveling 2.4 miles and the one after that 0.48 miles, with each leg's distance equal to one-fifth of the previous. At this point, if your high school calculus is not too rusty, you'd see that the answer involves summing the infinite series (a beautiful idea from calculus and the theme of Strogatz's piece), which has the formula: leading term/(1 – ratio). Since we have the leading term as 12 and the ratio (the factor by which the distance goes down for each switch) as 1/5, the answer is 15 miles. Whew!

But you can save yourself a lot of agony by adopting the bicycle's perspective. The tires, to be precise. Imagine them wondering: How long before I squash that annoying bug? It has a maddeningly (for those who overlooked the simplicity of this shift in perspective) simple answer. The bicycles meet halfway in one hour, so the insect traveling 15 miles per hour would have traveled 15 miles until its demise. Done! Question poser, super impressed.

To tease out subjectivity, though, I must say more about the setting where this puzzle appears in Strogatz's article. He narrates the story of how when this was posed to John von Neumann, one of the greatest polymaths ever, he took the first route (infinite series) but gave the answer in a fraction of the time it would take most people to compute it via the second route (the bicycle-eye view). I included this problem because, unlike the chess problem earlier with its "homework" flavor, this one carries a distinctive "solve-it-right-now" note, allowing me to foreground a frame's suitability conditioned on the time aspect: one is blocked when one's approach does not deliver a path to the solution within seconds or a few minutes. Seen this way, von Neumann did not need an auxiliary matrix because the conventional frame (infinite series) did not erect any speed bumps, given his mathematical prowess. Whereas we might read the qualification implicit in Huh's comment about intuition as related to humans' cognitive makeup, the fly puzzle highlights individual variability. The broader lesson is that one would be well-advised to acquire multiple ways of seeing since what works for one may not work for others, and what works in one context—which might include helpful cues for navigating a frame—may not work in another.

The above examples are, at best, acts of rediscovering. We are all thankfully blessed with an exploratory drive, so if we let it, we will—and often—experience the thrill of our minds coaxing the bisociative event. As the Owen Wilson character says in Marry Me, "If you sit in a question, the answer will find you" (now, here's one romcom punching way above its weight). But a discovery, without any qualification, is, as Koestler puts it, "the uncovering of something which has always been there but was hidden from the eye by the blinkers of habit." In this telling, the blinkers were a collective affliction, and only a lucky few have had the privilege of making genuine discoveries—correcting the vision impairment, as it were, that had hobbled all humanity.

One such person is Archimedes, who gave us "eureka." It is now synonymous with the joy felt when a solution reveals itself (even if only offering a glimpse). The passivity implied in the previous sentence is deliberate. Often, the arduous search process is like opening an almost endless succession of doors with only a vague sense of what one is looking for and, suddenly, chancing upon a door that holds a clue that gives definite shape to that intimation, illuminating the way to the goal.

Let's briefly revisit the story of Archimedes' search to trace the contours of a canonical discovery. Archimedes was grappling with a problem posed by his king, Hiero II of Syracuse, in the lead-up to his celebrated flash of insight. The king wanted to know how he could determine whether the gold crown he had commissioned, now in his possession, contained any impurities. The problem belonged naturally to a geometrical frame; this is how Archimedes initially thought about it. How do I measure the volume of this highly irregular object? With the volume, he could easily calculate the density and compare it with the equivalent density of pure gold to detect impurities. One day, deeply immersed in the problem, he proceeded to experience an immersion of a different sort: his daily bath. He descended into the bathtub, the act no different this time from the numerous other times in the past. Only this time, his body displacing water triggered the analogy where the submerged part of his body was like the crown, displacing its volume equivalent; it was as if he had found a way to melt the crown without destroying it. This thought (sowing the seed for the principle of buoyancy, which we now know as the Archimedes principle) led him to run through the streets, shouting "eureka" as an expression of unrepressed glee and doing so, as we all know, textile-free.

What stands out in Archimedes' breakthrough is that the properties of novelty and surprise, often associated with the classical notion of creativity, are objectively true here. The reframing that led to the eureka moment would have been new to and surprised everyone then. But it's also true that the original geometrical frame was not inherently wrong for the problem that tormented him, only practically infeasible. This leads us to another class of discoveries where problems eluded existing frames, even in principle.

Consider another analogy perhaps even more iconic than Archimedes': Newton abstracting an apple into a mass object whereby the object "draws the earth, as well as the earth draws [it]." The type of thinking that preceded Newton's to explain the phenomenon of an apple falling not any other way but towards the center of the earth involved ideas about the purposes and natures of objects that drew them to their natural states, a relic of Aristotelian thinking. In this way, like the matrix was blocked only loosely speaking in the puzzle examples, the idea of auxiliary only makes loose sense in Newton's discovery. It wasn't as if he had a somewhat suitable frame to reason about the phenomenon, as there was an overpowering sense that it lacked an explanation. The bisociative event was precipitated by the oppressive weight of the arbitrariness of ideas about the goals of things—teleological thinking—and marked an escape from a no man's land into a frame that was not so much a matter of convenience as necessary to provide a scientific description of reality.

Newton's theory could explain many gravitational phenomena—it had tremendous reach—but not all. A notable example is the gravitational lensing effect, where a massive celestial object between the observer and the observed bends light and distorts the image, sometimes even causing the observer to see double. To explain this effect, we need Einstein's general theory of relativity, which introduces the concept of spacetime—the cosmic fabric that warps around mass, creating gravity. With this framework, it's unsurprising that massless photons bend as they pass near such a massive object, distorting the image we see. This example illustrates what Thomas Kuhn argued about scientific progress: even the best explanations usually have an unknown expiry date. Along comes a problem—observations that don't fit the prevailing theory—triggering a crisis, only solved by a better explanation. Kuhn called this a "paradigm shift," a term now in everyday (mis)use. In sum, even at the rarefied end of the spectrum of discoveries, generating new explanations—Deutsch's definition of creativity—is ceaseless.

Allow me to reframe this in terms that clarify how we should view AI's capacity for creativity (current or future). The fallibilist notion that errors are inevitable is at the core of the incredibly optimistic idea of endless progress. As a result, the "quest for good explanations" is not a zero-sum game, and, in this view, artificial creativity (use it if we have it or generate the knowledge required to create it) is not something to dread.

But would we recognize it if we saw it? Keeping our focus on novel discoveries, we might ask, for instance, What unblocked the matrix for someone like Einstein? Thankfully, Einstein was gracious and described clearly, often poetically, his thought processes that led to his breathtaking insights into reality. As if to confirm Koestler's claim that creative patterns across domains are alike, I came across the relevant quote in the section on discovery in Koestler's book and again in John Cleese's—best known for his work with Monty Pythonbook on creativity, where it appeared in the context of comedy. Einstein says, "The words or the language, as they are written or spoken, do not seem to play any role in my mechanism of thought. The psychical entities which seem to serve as elements in thought are certain signs and more or less clear images which can be 'voluntarily' reproduced and combined." It is a remarkable quote, for it suggests that at the other end of theories with exquisite mathematical precision lies something as tenuous as "certain signs and more or less clear images." Perhaps even more remarkable is that Einstein was not unique in this regard. In a survey conducted by Jacques Hadamard (who had earlier surveyed Einstein), many top mathematicians described their thinking in ways that echoed Einstein's.

Einstein lends his considerable voice to dismissing the notion that formalisms and symbol manipulation play a significant role in originating creativity. His quote also offers a clue to the elements that play a role—elements that K?stler explores by parsing it in the light of numerous accounts of groundbreaking discoveries, including Einstein's. In this rich context, the signs and vague images roughly correspond to those our minds conjure during dreams and dreamlike states.

That begs the question: What induces those states? Take, for example, the famous story of August Kekulé, who, in a state of reverie, dreamed of the ouroboros—the snake eating its own tail—which is said to have led to his discovery of benzene's ring structure. If we were to roll the tape on Kekulé's life leading up to that moment, K?stler argues, we would find months of immersion in material relevant to the problem. Again, Kekulé is not an outlier. This shows that the notion a "flash of insight" evokes—of a mysterious, no- or low-agency process that fashions a key to unlock a thorny problem—is misleading. A more complete picture is best captured by Louis Pasteur's quote: "Luck favors the prepared mind." It's no wonder the preparation phase precedes illumination in many scholarly accounts of creativity. In a New Yorker article, Louis Menand expressed it memorably: "You need to have a pretty informed idea of what the box is before you can think outside it."

It is not immediately clear why this is the case. Indeed, the notion of a blocked matrix might suggest that an outsider—a maverick not beholden to established rules—is more likely to spot the crucial idea leading to a novel discovery, especially since these ideas often appear so obvious in hindsight. Daniel Dennett, a rigorous thinker and engaging communicator who recently passed away (a loss deeply felt), repurposes a concept from cognitive scientist Douglas Hofstadter into a thinking tool that helps elucidate this process.

The concept of "Jootsing," or "jumping out of the system," invites deeper reflection on the creative process. Dennett argues that immersion in a given domain is essential for two reasons: first, it is well-established for a good reason, and second, those who uphold its tradition have meticulously documented the assumptions that hold the key to unlocking the problem. In other words, Jootsing directs attention to those underlying assumptions, any one of which might be flawed. In a sense, it expands on Mark Twain's famous observation: "It's not what we don't know that gets us in trouble. It's what we know for sure that just ain't so."

With the requisite emphasis on preparation, K?stler notes that sleep loosens the iron grip of traditional or routine thinking, allowing for a more expansive search for an auxiliary matrix. In other words, sleep—or sleep-like states—awakens the maverick in us. If we are also experts in the problem that has captured our attention, we may witness, in our sleep, a kaleidoscope of signs and images playing, in an informed fashion, with the rules of the domain in which the problem resides. A subtle point here is that by becoming less constrained, we become more aware of the rules of the game. In this way, sleep acts as an antidote to an affliction delightfully known as "semantic saturation," where repeated exposure to an idea makes it less likely to register in one's consciousness. By playing fast and loose with logic, our "daily dip into the ancient sources of mental life," as K?stler calls dreaming, brings those rules into sharper relief—and noticing them is the first step to challenging them. (More generally, even in the waking state, this may be why so much emphasis is placed on "play" as a conduit for creative discoveries. Play is curiosity in action, fostering deep engagement with ideas without an overriding focus on achieving specific goals. It builds a repertoire of experiences—raw material—that a fugue-like state can draw upon to produce novel insights.)

Now, I want to pause and reflect on the striking (in my mind) similarities between what we experience in sleep and the neural architecture known as deep learning, which powers many of today's most advanced AI models, including ChatGPT and its ilk.

Let's consider large language models like ChatGPT. LLMs owe their massive success to a representational masterstroke. They recast the nearly impossible problem of acquiring expertise across a dizzying array of fields—essentially becoming a general-purpose answering machine—into a more manageable one by cleverly reframing the task as predicting the reasonable next thing to say. This shift turns the challenge into a probabilistic question, focusing on computing next-word probabilities rather than requiring specialized domain knowledge. LLMs can accomplish this because they possess, in a compressed form, an astonishing breadth of text—quite literally, the entire accessible Internet.

The probabilistic framing also addresses the problem of understanding the context because next-word prediction given the question enables the LLM to navigate to an appropriate location in the vast meaning space, which is its internalized data cast into a type of representation (where geometric proximity equals semantic proximity) that permits this move.

These similarities exist for a few reasons. Large language models (LLMs) break down text into smaller units called tokens, typically sub-words. This approach is less rigid than focusing on whole words. By fragmenting words into smaller pieces, LLMs are more likely to capture and examine the underlying rules that connect concepts.

The "deep" in deep neural networks refers to stacking multiple layers of artificial neurons. Why so many layers? Each layer captures different aspects of the concepts crucial for learning. In other words, these layers correspond to problem representations at various levels of abstraction, which explains why more layers often lead to richer representations.

Another hallmark of deep learning—and machine learning in general—is its commitment to bootstrapping learning from data alone, without relying on explicit rules. This approach, together with key features like its layered architecture and tokenization regime, contributes to a framework that enables what Jonah Lehrer, in Imagine, calls "conceptual blending"—the mixing of ideas that might not traditionally align without strict regard for established conventions or consistent levels of abstraction.

In effect, neural networks seem to possess at least the stirrings of what Koestler calls "intellectual libertinage"—the ability to freely associate and combine elements beyond conventional constraints, much like what occurs in dreams.

But clearly, we don't have AI "scientists" churning out groundbreaking discoveries. So, what's missing? Before addressing what I believe to be the most consequential missing ingredient, let's consider a simpler issue: LLMs play too fast and too loose.

In a solo podcast episode, Sean Carroll described what happened when he posed a chess puzzle to ChatGPT. Although the puzzle sounds complex, it's trivial for anyone familiar with the game. It revolves around determining which side—white or black—has the advantage in "toroidal chess," a variant where the board's edges are connected to form a donut shape. In this version, pieces can move over the edge to the opposite side while otherwise following standard chess rules. Carroll describes two behaviors ChatGPT demonstrated in its answer that should be familiar: buttering you up and filibustering when it doesn't know the answer. It went on and on about how this was an insightful question (haven't we all seen ChatGPT say something like, "I'm privileged to have your deep insights percolate through my deep network?") and then started weighing the pros and cons. The real answer is simple: the starting position is illegal because the kings are in check. (Carroll mistakenly claims black wins, but that's beside the point.)

So, to put things crudely, fewer constraints are good, but no constraints are bad. As illustrated in the accompanying meme (courtesy of Professor Subbarao Kambhampati), the "edifice" of LLMs depends heavily on the prompter knowing the answer.


LLMs' shaky edifice of reasoning.

The critical deficiency, however, is more nuanced. To draw out the answer, let's briefly reenter the realm of dreams—the biological variety. K?stler notes that Louis Pasteur once pondered the process that sifts through the innumerable collisions between ideas to select the most promising ones. He was asking himself what the nature of this "mysterious sieve" was. His answer: "The aesthetic sensibility of the real creator." In short, the discerning critic in humans, often subsumed under the concept of "intuition," does the job.

The job of this sieve is not an enviable one. I came across a tweet by a mathematician who shared that he woke up in the middle of the night with a foggy sense of elation, believing he'd discovered the proof for a long-standing problem. He quickly scribbled something on a pad by his nightstand, only to find that he'd written something no more sophisticated than 1 + 1 = 2 the following morning. As Nietzsche wrote, "In reality, the imagination of the good artist or thinker produces continuously good, mediocre or bad things, but his judgment, trained and sharpened to a fine point, rejects, selects, connects... All great artists and thinkers are great workers, indefatigable not only in inventing but also in rejecting, sifting, transforming, ordering." Considering that the unconscious, dreaming mind, as K?stler observed, is "pullulating with nascent analogies," there's certainly a lot of sifting to do. (Let's take a moment to appreciate "pullulating"; you can almost hear the neurons crackling with activity.)

Turning to LLMs again, two things stand out. First, unlike the human brain, they lack a default mode network—the brain's active state when we appear to be doing nothing, during which there's furious communication between the front and back parts of the brain that are usually not in conversation. (More technically, these are the prefrontal cortex, posterior cingulate, medial temporal lobe, and precuneus.) In other words, to the question, what do the LLMs think when we are not prompting? The answer is "nothing."

Second, and perhaps more pertinent to our discussion, it's not a stretch to say that artificial neural networks must also be pullulating with nascent analogies. However, without a discerning audience, it's as if they never occurred. In the spirit of the famous philosophical thought experiment—if a tree falls and no one is there, does it make a sound?—in this case, it's an emphatic "no."

Notice that I'm not talking about the output. In LLMs, the outputs are sanitized by a process known as RLHF (reinforcement learning from human feedback) that, by making them fit with what one would expect, often beats the creativity out of it. I'm referring to the lead-up to output generation when there is the intellectual libertinage that K?stler talks about. It's interesting to consider whether we could intervene early in the process and introduce a "critic" who observes. However, the framework of LLMs doesn't offer a clear path to instantiate such a module, nor do we yet know how to program one. As Yann LeCun, one of the architects of modern deep learning, has said, LLMs are an off-ramp on the path towards artificial general intelligence, hence, also artificial creativity. ?

Although this might be "dispiriting" for LLMs, it's good news for us humans—or more precisely, for the idea of "collective intelligence," where the collective includes AI. By inserting ourselves into the process and not merely delegating but actively engaging in conversation, we may witness nascent analogies in ways previously unimaginable. Without a curator—and for now, that can only be humans—perhaps countless sparks of creativity are being extinguished in the digital subconscious. Put differently, the "minds" of AI are likely smushing together pens and pineapples, but unless we intervene, they can't know (at least not yet) that this blending has the makings of a viral sensation—one that can amuse, infuriate, or even inspire.

Anantha Shankar K R

I can help you translate strategies into tactical plans through IBP | Design Thinker | Agile doer | Problem solver

4 个月

Thank you for making PPAP a earworm in my head, again! Thanks again for helping me learn about "bisociation". This seems like a much more scientific term than "idea sex" that James Altucher coined.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

4 个月

You eloquently crafted a framework for understanding AI's creative potential by revisiting Arthur Koestler's "bisociation." Koestler's work resonates deeply with the current debate surrounding AI creativity, as it posits that true innovation stems from connecting seemingly disparate ideas. This echoes the challenges faced by LLMs, which often struggle to synthesize information from diverse sources in novel ways. Historically, breakthroughs in human creativity have often emerged from periods of cultural and intellectual ferment, where individuals were exposed to a wide range of perspectives and disciplines. Could this suggest that fostering environments rich in cross-disciplinary collaboration might be crucial for unlocking AI's creative potential? Furthermore, if "bisociation" is indeed the key to creativity, how can we measure or quantify the degree to which an AI system possesses this capacity?

要查看或添加评论,请登录

Ganesh Sankaran的更多文章

社区洞察

其他会员也浏览了