By Sidharth Ramachandran — Dec 31, 2022

Scaling asset generation using Generative AI

As I spend more time in the media industry and observe the popularity of content and streaming platforms like YouTube, TikTok, and Netflix, I increasingly end up drawing parallels to the manufacturing industry. In a factory, each production line is capable of manufacturing different products at scale that are sold and used by people all over the world. In the media industry, there is a decentralized production process (scripting, recording, editing) that each creator follows to generate their content which is then consumed by people all over the world. The question that comes to mind is whether we can create the equivalent of a "production line" (that allows factories to scale) to the content generation process that allows creators and platforms to generate their assets (media) at scale.

I came across a very nice application of this scaling capability & production line for creating assets for the gaming industry that I found very interesting. This is illustrated very well in the following tweet thread but I will try to summarize and present a more general view below.

Thread time 🧵

Here's how to precisely design a small building in a game (such as an isometric bunker) by fine-tuning #StableDiffusion

This example was inspired by #RedAlert, which I spent countless hours on (in 96-97 - pls don't call me old 😅)

CC https://t.co/dlRJIWtD3Y pic.twitter.com/IqEaPRyZAu
— Emm (@emmanuel_2m) November 30, 2022

Let's start with an application most smartphone users are familiar with - Whatsapp. There has been an emoji option provided for a while that comes with multiple skin-tone options like so -

While these are just variations of an emoji in different skin tones and colors, imagine being able to generate even more different variations like "Thumbs-Up emoji with full-sleeved shirt and cufflinks" or "Thumbs-Up emoji with rings". This application while fun may not be the next big feature for Whatsapp! However, if you extend this example to a strategy game where users are building on their territory with farms, forts, and bunkers - you start seeing how these assets can be created at scale and a theoretically unlimited number of customizations.

An example of this is what follows later in the Twitter thread where the user is able to create bunkers in various styles for example one in the soviet style and another in an American style -

Soviet-style bunkers (left) and American-style bunkers (right)

One can already see that there are certain effects that are not desired when you want to present this in a production game to users. For example, there are multiple red flags in the soviet style bunkers on the left but single flags on the American-style bunkers on the right. Similarly, there are red-colored carpets on the left but no colored carpets on the right. Maintaining consistency is probably one of the biggest challenges of generative models. What this also means is that the workflow will always need to have a human-in-the-loop who makes the final decision and updates the assets.

To me, this also addresses one of the criticisms that scaling an inherently creative process is not favorable or possible. I completely agree that each art piece is the result of several conscious and subconscious choices by the artist and I still believe this will be the case going forward. However, I believe that the use of such tools can massively scale up the process and even help artists to discover and explore more concepts that might have been limited previously.

Subscribe to geil.tech