Behind the Scenes with Stadia’s Style Transfer ML
Style transfer is the process of combining the content of one image and the style of another to create something new. This is a well-studied problem in machine learning with a number of open-source implementations available. Many techniques can create aesthetically pleasing still images; however, when used frame-by-frame on animations and movies, the results are sometimes inconsistent. Features that appear in one frame - like colors, textures, and brush strokes - might vanish in the next, resulting in an unpleasantly flickering video. Previous techniques to address these problems required computational resources that made real-time interaction infeasible in modern screen resolutions (such as 1080p).
Inspired by the Magenta team’s ‘real-time arbitrary image style transfer’ model [code and paper], we set out to explore what improvements would be necessary to apply this technique in real time to video games. We started by removing parts of the model that couldn’t be efficiently computed by the GPU because we knew we wanted to hit a very high performance target. Surprisingly, we found that the pieces of the model that are left are still able to perform the stylization. We then increased the number of residual blocks in the middle layers of the model until visual quality stopped improving. Here is the model architecture we ended up with:
separable_conv2d(3x3 kernel, depth 32, stride 2) separable_conv2d(3x3 kernel, depth 64, stride 2) 15 x residual_block(3x3 kernel, depth 64) transpose_separable_conv2d(3x3 kernel, depth 32, stride 2) transpose_separable_conv2d(3x3 kernel, depth 3, stride 2)
Stadia’s style transfer model architecture.
To enhance the stability across consecutive frames, we use an additional term in the model’s loss function based on the predicted optical flow of the pixels [inspired by these papers]. Although this can be computationally intensive, it only needs to be calculated when training the model. The model learns to enforce stability of features even though at runtime it only sees the current frame.
In order to apply the style transfer to any game, without the need for access to the game’s source code, we created a set of parameterizable Vulkan postprocessing shaders. The weights from the trained TensorFlow model are exported, given to the shader, and can be changed at runtime. The final model weights are quite small (around 512KB) and the shaders run in real time on top of the game stream.
We built a greybox level to demonstrate rapidly iterating on a game’s artistic style in real time.
Translating from an illustrated, two-dimensional piece of concept art into a fully realized game environment would ordinarily require custom texture painting, modeling, material crafting, lighting, and tuning. Real-time artistic style transfer potentially allows developers to go straight from looking at a concept to testing it in a live, interactive game environment. This might enable rapid iteration of a video game’s art style. Real-time execution of artistic style transfer also opens up new forms of video game interaction, including the shifting of visual styles during gameplay, individually customized artistic styles (personalization by the player), styles generated through user generated content (turn a drawing into rendered game art). Note that the 3D assets used to build our greybox demo world were not specially made by us to influence the results, but are available to anyone on Unreal Engine’s Marketplace.
It is our hope that our prototypes might inspire a new world of game mechanics. What new types of experiences could be created by using this technology at runtime? Perhaps a game where the style changes based on the mood of the player’s character, or one where different parts of the game world or universe could have their own styles. We look forward to seeing your ideas about what this rapid iteration of artistic styles unlocks for your games.
It is fun to experiment with your own visual concepts for game look.
Here is an original concept art piece by our very own software engineer Richard Wu and the in-game results.
Imagine a game world styled with a TensorBoard graph or a famous work of art.
What new worlds can we create or experiment with just by feeding the model a single image?
This project is conducted in collaboration with many people. Thanks to Richard Wu, Anna Kipnis, Andeep Singh Toor, Paula Angela Escuadra, Jeff Hardouin, Lee Dotson, Aaron Cammarata, Maggie Oh, Marc Destefano, Erin Hoffman-John, and Colin Boswell. Thanks to everyone who supported many hours of art direction and playtesting feedback throughout this project.
--Ryan Poplin, Stadia Research Scientist & Adam Prins, Stadia Software Engineer