Finding the Beauty in the Imperfections of Generative Art

September 5, 2022 :: 3 min read

Generative art has taken over the internet by storm. It fails in more than one way but do we really care?

A world view centered on the acceptance of transience and imperfection. The aesthetic is sometimes described as one of appreciating beauty that is “imperfect, impermanent, and incomplete” in nature. This is how the Wikipedia describes wabi-sabi — a concept popular in Japanese art.

A city floating in the sky.
Generated with MidJourney using the prompt: "a lost city in the sky 4k artstation". Picture source.

Cool new stuff

In the last couple of weeks, we’ve seen a lot of art generated using text-to-image models, mostly DALL·E, MidJourney and Stable Diffusion. They aren’t new as a concept but there has been a lot of progress in recent years.

If somehow you missed all the hype, in essence, they are deep learning models that take a description of a scene, aka the prompt, as the input and try to produce an image that represents it the best.

They vary a bit in what they’re good at. In my experience, Stable Diffusion is the best at producing a high quality image, even if it ignores or misinterprets some of the prompt. MidJourney is quite similar but more in an artsy sense. As far as I know, its authors used more training data from art-sharing websites such as ArtStation and DeviantArt. Lastly, DALL·E, which is the best at capturing the relationships between the objects in the prompt, even if the image is not as impressive as from the other two models.

Two corgi dogs in a karate fight
Generated with DALL·E using the prompt: "a cinematic photo of two corgis wearing kimonos in a karate fight". Notice how the partial kimono is actually made of dog's hair. Picture source.

Output artifacts

Apart from misunderstanding the prompts, generated images can still have many artifacts — flaws in the output caused by the quirks and rough edges of the model. These can include materials melted into each other, hair growing out of fabrics, colours overflowing from one item into another, unintended psychedelic textures stemming from the internal representations, and many more.

We consider them flaws but is that all they are?

Screenshot from a game with pixel graphics
Pixel based graphics are a creative choice that can give a sense of nostalgia. Picture source.

New art direction

It’s 2022 and we still make games with pixel graphics even though we can render almost life-like images. We use hand drawn illustrations as assets, even though it’s incredibly expensive. We do it because we enjoy the style, what it stands for, and the feelings it invokes. It’s a creative choice.

In the same vein, I think the artifacts of generative models are a style, an art direction to pursue.

They are a snapshot of the current era. Even though as time goes by, we’ll get rid of most those flaws. The fuzzy, texture clipping generative art might be the vibe that we associate with the 20s. The psychedelic AI hallucinations — a perfect match for the global pandemic, war, and looming climate change.

Though it isn’t just about the style. Accessible tools allow more people to participate in the creative world and express themselves. Do you want to make a realistic render of your child’s crayon drawing? Well, now you can. Are you running a Dungeons & Dragons campaign, and need some concept art? Just generate it with a prompt. Do you want an edgy edit of your math’s teacher photo that gets you expelled from school? You got it. A trend with a lot of momentum is what defines an era.

However, this isn’t to say that there aren’t any challenges. Collectively, we need to figure out how to handle the copyright and ownership of generative art (more on that soon). How to compensate the authors whose work ends up used for training models? How to bring this technology to people that cannot afford to pay for the service or beefy computers?

To wrap it up, check out this eerie video posted on Reddit recently; made with Disco Diffusion.

More posts.