The Simple Programs That Beat ChatGPT, Midjourney, and DALL-E
A boyfriend just going through the motions. A spouse worn into the rut of habit. A jetlagged traveler’s message of exhaustion-fraught longing. A suppressed kiss, unwelcome or badly timed. These were some of the interpretations that reverberated in my brain after I viewed a weird digital-art trifle by the Emoji Mashup Bot, a popular but defunct Twitter account that combined the parts of two emoji into new, surprising, and astonishingly resonant compositions. The bot had taken the hand and eyes from the 🥱 yawning emoji and mashed them together with the mouth from the 😘 kissing-heart emoji. That’s it.
Compare that simple method with supposedly more sophisticated machine-learning-based generative tools that have become popular in the past year or so. When I asked Midjourney, an AI-based art generator, to create a new emoji based on those same two, it produced compositions that were certainly emojiform but possessed none of the style or significance of the simple mashup: a series of yellow, heart-shaped bodies with tongues sticking out. One appeared to be eating another tongue. All struck me as the kinds of monstrosities that might be offered as prizes for carnival games, or as stickers delivered with children’s-cancer-fundraising junk mail.
ChatGPT, the darling text-generation bot, didn’t fare much better. I asked it to generate descriptions of new emoji based on parts from existing ones. Its ideas were fine but mundane: a “yawning sun” emoji, with a yellow face and an open mouth, to represent a sleepy or lazy day; a “multi-tasking” emoji, with eyes looking in different directions, to represent the act of juggling multiple tasks at once. I fed these descriptions back into Midjourney and got competent but bland results: a set of screaming suns, a series of eyes on a yellow face dripping from the top with a black, tar-like ooze.
Perhaps I could have drafted better prompts or spent more time refining my results in ChatGPT and Midjourney. But these two programs are the pinnacle of AI-driven generative-creativity research, and when it came to making expressive, novel emoji, they were bested by a dead-simple computer program that picks face parts from a hat and collages them together.
People have dreams for AI creativity. They dream of computers dreaming, for starters: that once fed terabytes of text and image data, software can deploy something like a machine imagination to author works rather than merely output them. But that dream entails a conceit: that AI generators such as ChatGPT, DALL-E, and Midjourney can accomplish any kind of creativity with equal ease and performance. Their creators and advocates cast them as capable of tackling every form of human intelligence—as everything generators.
And not without reason: These tools can generate a version of almost anything. Many of those versions are wrong or misleading or even potentially dangerous. Many are also uninteresting, as the emoji examples show. Using a software tool that can make a particular thing is quite a bit different—and a lot more gratifying—than using one that can make anything whatsoever, it turns out.
Kate Compton, a computer-science professor at Northwestern University who has been making generative-art software for more than a decade, doesn’t think her tools are artificially intelligent—or intelligent at all. “When I make a tool,” Compton told me, “I’ve made a little creature that can make something.” That something is usually more expressive than it is useful: Her bots imagine the inner thoughts of a lost autonomous Tesla and draw pictures of hypothetical alien spacecraft. Similar gizmos offer hipster cocktail recipes or name fake British towns. Whatever their goal, Compton doesn’t aspire for software generators such as these to master their domain. Instead, she hopes they offer “the tiny, somewhat stupid version of it.”
That’s a far cry from the ChatGPT creator OpenAI’s ambition: to build artificial general intelligence, “highly autonomous systems that outperform humans at most economically valuable work.” Microsoft, which has already invested $1 billion in OpenAI, is reportedly in talks to dump another $10 billion into the company. That kind of money assumes that the technology can turn a massive future profit. Which only makes Compton’s claim more shocking. What if all of that money is chasing a bad idea?
One of Compton’s most successful tools is a generator called Tracery, which uses templates and lists of content to generate text. Unlike ChatGPT and its cousins, which are trained on massive data sets, Tracery requires users to create an explicit structure, called a “context-free grammar,” as a model for its output. The tool has been used to make Twitter bots of various forms, including thinkpiece-headline pitches and abstract landscapes.
A context-free grammar works a bit like a nested Mad Lib. You write a set of templates (say, “Sorry I didn’t make it to the [event]. I had [problem].”) and content to fill those templates (problems could be “a hangnail,” “a caprice,” “explosive diarrhea,” “a [conflict] with my [relative]”), and the grammar puts them together. That requires the generative-art author to consider the structure of the thing they want to generate, rather than asking the software for an output, as they might do with ChatGPT or Midjourney. The creator of the Emoji Mashup Bot, a developer named Louan Bengmah, would have had to split up each source emoji into a set of parts before writing a program that would put them back together again in new configurations. That demands a lot more effort, not to mention some technical proficiency.
For Compton, that effort isn’t something to shirk—it’s the point of the exercise. “If I just wanted to make something, I could make something,” she told me. “If I wanted to have something made, I could have something made.” Contra OpenAI’s mission, Compton sees generative software’s purpose differently: The practice of software-tool-making is akin to giving birth to a software creature (“a chibi version of the system,” as she put it to me) that can make something—mostly bad or strange or, in any case, caricatured versions of it—and to spend time communing with that creature, as one might with a toy dog, a young child, or a benevolent alien. The aim isn’t to produce the best or most accurate likeness of a hipster cocktail menu or a daybreak mountain vista, but to capture something more truthful than reality. ChatGPT’s ideas for new emoji are viable, but the Emoji Mashup Bot’s offerings feel fitting; you might use them rather than just post about the fact that a computer generated them.
“This is maybe what we’ve lost in the generate-everything generators,” Compton said: an understanding of what the machine is trying to create in the first place. Looking at the system, seeing the possibilities within it, identifying its patterns, encoding those patterns in software or data, and then watching the thing work over and over again. When you type something into ChatGPT or DALL-E 2, it’s like throwing a coin into a wishing well and pulling the bucket back up to find a pile of kelp, or a puppy, in its place. But Compton’s generators are more like putting a coin into a gachapon machine, knowing in advance the genre of object the thing will dispense. That effort suggests a practice whereby an author hopes to help users seek a rapport with their software rather than derive a result from it. (It also explains why Twitter emerged as such a fruitful host for these bots—the platform natively encourages caricature, brevity, and repetition.)
Much is gained from being shown how a software generator works, and how its creator has understood the patterns that define its topic. The Emoji Mashup Bot does so by displaying the two emoji from which it constructed any given composition. One of the first text generators I remember using was a weird software toy called Kant Generator Pro, made for Macs in the 1990s. It used context-free grammars to compose turgid text reminiscent of the German Enlightenment philosopher Immanuel Kant, although it also included models for less esoteric compositions, such as thank-you notes. The program came with an editor that allowed the user to view or compose grammars, offering a way to look under the hood and understand the software’s truth.
But such transparency is difficult or impossible in machine-learning systems such as ChatGPT. Nobody really knows how or why these AIs produce their results—and the outputs can change from moment to moment in inexplicable ways. When I ask ChatGPT for emoji concepts, I have no sense of its theory of emoji—what patterns or models it construes as important or relevant. I can probe ChatGPT to explain its work, but the result is never explanatory—rather, it’s just more generated text: “To generate the ideas for emojis, I used my knowledge of common concepts and themes that are often represented in emojis, as well as my understanding of human emotions, activities, and interests.”
Perhaps, as creative collaborations with software generators become more widespread, the everything generators will be recast as middleware used by bespoke software with more specific goals. Compton’s work is charming but doesn’t really aspire to utility, and there is certainly plenty of opportunity for generative AI to help people make useful, even beautiful things. Even so, achieving that future will involve a lot more work than just chatting with a computer program that seems, at first blush, to know something about everything. Once that first blush fades, it becomes clear that ChatGPT doesn’t actually know anything—instead, it outputs compositions that simulate knowledge through persuasive structure. And as the novelty of that surprise wears off, it is becoming clear that ChatGPT is less a magical wish-granting machine than an interpretive sparring partner, a tool that’s most interesting when it’s bad rather than good at its job.
[Read: ChatGPT is dumber than you think]
Nobody really wants a tool that can make anything, because such a need is a theoretical delusion, a capitalist fantasy, or both. The hope or fear that ChatGPT or Midjourney or any other AI tool might end expertise, craft, and labor betrays an obvious truth: These new gizmos entail whole new regimes of expertise, craft, and labor. We have been playing with tech demos, not finished products. Eventually, the raw materials of these AI tools will be put to use in things people will, alas, pay money for. Some of that new work will be stupid and insulting, as organizations demand value generation around the AI systems in which they have invested (Microsoft is reportedly considering adding ChatGPT to Office). Others could prove gratifying and even revelatory—if they can convince creators and audiences that the software is making something specific and speaking with intention, offering them an opportunity to enter into a dialogue with it.
For now, that dialogue is more simulated than real. Yes, sure, you can “chat” with ChatGPT, and you can iterate on images with Midjourney. But an empty feeling arises from many of these encounters, because the software is going through the motions. It appears to listen and respond, but it’s merely processing inputs into outputs. AI creativity will need to abandon the silly, hubristic dream of artificial general intelligence in favor of concrete specifics. An infinitely intelligent machine that can make anything is useless.