In 2022 I wrote an article dedicated to “creative” AI applications. This was before ChatGPT, Dall-e, Gemini and others. Back then, discussions were mainly speculative, always centered on the question: if we’re here today, what might the future hold?
To make use of these technologies, one needed technical prowess (technical knowledge in database management and Python, for example), and the achievable results were far from realistic or aesthetically pleasing.
Now the tech is readily accesible, user-friendly, and the outcomes increasilly better. The evolution of the models aims to enhance the realism of images and minimize the gap between the initial prompt and the final output; between the intention of the prompter and what machine makes of it. Now the process of constructing prompts have incorporated the use of LLMs to make them as “effective” and comprehensible to machines as possible. The machine “infers” the user’s desires and engineers the prompt on behalf of them.
This development follows the general course of technological evolution, aiming to reduce the entry barrier between users and machines. Interfaces strive to be friendly, intimate and frictionless. The black box at the heart of machines becomes more complex, yet its face remains as familiar as ever. Behind a simple box awaiting for a prompt lies veiled mathematical abstraction, computational power and the vast, incomprehensible flow of data.
The democratization of tech becomes a reality thanks to the interface. Through it, creativity is just a click away. The machine’s imagination becomes tangible, at the reach of a click.
Back then, engineers and programmers often utilized dreamlike metaphors for their models: Deep Dream, BigSleep. However, the outcomes were far from dreamlike; they were rarely realistic, often eerie, macabre, abstract or just strange.
In those models, the results came into being from probability collions without clear directions. In them, there’s no realism; it's as if the machines were precisely trying to move away from it. Faced with these images, the viewer’s gaze loses its anchors. The textures of these images reveal the process and suggest the existence of an ‘artificial imagination’ moving in an as-yet-unnamed direction.
The techniques and architectures that propelled this generation of machines, known as Generative Adversarial Networks of GAN for short, are very different from those underlying current plataforms, such as diffusion models. This difference has also translated to the level of their interfaces. Like ChatGPT, current image generators are designed to ‘meet our expectations, visually. This agreeableness is right there in the math, in the way these tools distill millions of images into a multidimensional array of the proximities of various styles and shapes. They angle to be familiar,’ as Dan Cohen puts it.
In diffusion models, language on a prompt becomes a guide through the latent space of machine memory. A languague that seeks coincidence. That’s the goal.
Nevertheless, ‘dreaming’ is the verb chosen by Stability.AI, the company behind DreamStudio, when generating an image with its model. Midjourney, on the other hand, opted for the verb ‘imagining’. In both cases, probability is equated to the realm of the unconscious and human imagination, as noted by the researcher and digital artist, Eryk Salvaggio. But here, imagination is directed towards the reproduction, reinterpretation, reorganization, and recombination of the database - of the past and the existing, I might add.
Machines do not dream or imagine the real world; the interface merely presents a hall of mirror onto which we project theses ideas. The fact that the real worlds aligns with its description by the machine is merely a statistical correlation.
The memory and mind of the machine works in a different way. The machine hallucinates, but where those image come from? Its output does not correspond with reality; they escape from it. But where do they escape to? In other words, what is the truth at the heart of the machine’s imagination?
They escape human-like reality, tumbling aimlessly through latent spaces and venturing beyond human logic, detached from our truth - that of our body, our world and our science. Without the tether of the body, without an understanding of the world. We speak of possibilities as yet unnamed by human language. An open field, lacking human cartographies, where only the machine can chart its own course.
The mexican artist, poet and programmer Eugenio Tisselli who has experimented with poetry-writing-machines, ventures to call this plane the “tecnema” (techneme)
A new pole that extends the boundaries of language (…) which is not committed to either precision or semantic ambiguity, as it lacks intention. In an utterance produced from the techneme, the human will to specify or juxtapose is not present. It represents, instead, the production of an artificial world from a language algorithmically produced. (…) Human will in the techneme (…) does not fully control what the computers emancipated from us generates. The techneme exceeds human will and therefore also exceeds the will for meaning.
Or as Meghan O’Gieblyn points out:
The models are like the prisoners in Plato’s cave, trying to approximate real-world concepts from the elusive shadow play of language. But it’s precisely this shadow aspect (Jung’s term for the unconscious) that makes its creative output so beautifully surreal. The model exists in an ether of pure signifiers, unhampered by the logical inhibitions that lead to so much deadweight prose. In the dreamworld of its imagination, fires explode underwater, aspens turn silver, and moths are flame colored.
Is this imagination? A machine dream of other worlds?