Creating Game AI Using Mostly English

One of the most difficult problems in crafting a believable character AI is accounting for every conceivable interaction the player may have with your character, and the tedious process of creating the glue between the player action and the content you've created for your game. But what if you could use Semantic ML to skip that glue step entirely?

In this article, some examples using Semantic ML will be presented, showing how it can be used to quickly prototype and craft an AI and its personality. Best of all, a natural language interface that doesn't require intervention from a programmer can allow the developer to solve problems at a higher level of abstraction, using the English language instead of simulation variables, and empower the less technical members of the team to contribute creatively to the crafting of AI behavior.

Statement of Problem

The work that goes into bringing characters to life in traditional game development is tremendously complex. It requires many disciplines and assets to flawlessly come together: concept, 3D modeling, animation, narrative, core engine, AI, and gameplay systems, behavior design, and so on. But the biggest challenge comes from making sure that the characters act and respond appropriately and meaningfully when the player is interacting with them.

The traditional approach to solving this complex problem features a number of time-consuming tasks. First, it requires hand-crafted, bespoke scripting to map between player interactions and what the game or character response should be. Typically, this task requires someone with programming or scripting knowledge, or at the very least, someone who is savvy with a node-based or other visual scripting system. It requires the implementer to make a translation: between a design direction in plain language, into code or another abstract set of instructions.

Another challenge is that if a player input has not been explicitly accounted for by the developer, or the player does something a little different from that expectation, the player may be greeted with a message from the game that indicates the input has not been understood or no response at all.
Semantic ML could help address these problems, save game developers iteration time, and allow them to focus on more interesting problems with character believability.

So what is Semantic ML? Semantic just means, “relating to meaning in language or logic.” At its core, semantic ML is just phrase or word associations. But of course, some words are more closely associated than others. For example, the word “flower” is more closely associated to “tulip” than it is to “funeral”.
What Semantic ML can do is give us these word distances, or more precisely, word vectors.
If you look closely, you’ll see that these word vectors are signals of context. For example, a tulip is a flower. And: a flower can be put into a vase.

What if you could smuggle this real world context into a game?

We’ll illustrate this idea using a little demo of a Fox. The way the Fox’s AI makes sense of its world is like this:

Certain objects in the scene, that might be interesting for the Fox to interact with, have been labeled in white. And in pink, you can see the sorts of actions the Fox can perform on those objects. This is very familiar to any of us who have set up characters before. So where does the ML model come in?

Brief Dive Into Technical Details

First, a few misconceptions about Machine Learning:

Myth: Requires Training the Model

This is something you hear about a lot in connection with ML, but it is not necessarily true.

Myth: Requires Massive Amounts of Data

This is important for being able to train or fine-tune an ML model. It turns out that it is not needed with this approach.

Myth: No Control Over Output

This is another myth, that an ML model is this black box and you have to just accept what it gives you.

The Semantic ML model used here is what’s called a dual encoder. It’s trained on billions of lines of human conversation, publicly available from all over the internet. In other words, someone posts something, and there’s a reply to that post. The model is trained on those pieces of text. Based on this training data, it makes predictions about what response will follow a particular input.
For the conversational Input/Response query, the model needs the semantic relationship between words in order to make predictions about what response will come next. So the model has these word vectors built in.
This means that the Semantic ML model has two modes, or two questions you can ask it:

As a user of this model you:
  • Provide the input
  • Provide all the possible candidates
  • Tell it which mode to use
And the model will rank the responses for you.

There’s a score it can give you, as well, which you can use for thresholds. For example, if the score is too low, it might mean that the phrase is not very similar and your game could choose not to use the content associated with it.

As you can see, using a pre-trained model means that you do not have to do any model training yourself in order to use Machine Learning in your game.

The model is engine-agnostic and does not require any changes to existing game systems to work. Its features can be used as much or as little as you want.

Application to Character AI

Let’s take a look at how the model can be used with AI behavior:

The coolest thing here is that the Fox wasn’t programmed in how to answer questions or even about what “coffee” is. The object that looks like a cup was simply labeled with the words “small mug” and the Fox had actions “pick up” and “give”. It was Semantic ML that made the connection between these concepts.

It also does not actually require any kind of freeform input from the player. You can turn actions into natural language under the hood. For example, when the player is near an object and presses the button for picking up an object, the game gets the message that the player’s intention is to take the object. When the button to throw an object is pressed, it throws the object, and this translation to plain language can be made under the hood and given to the Fox:

There can of course be an even more natural way to communicate to a character, by just speaking to the character, with speech to text.

How Does the Fox Work?

Using a simple grammar of the form “I verb noun”, or “I verb”, an expression space is created of everything the Fox can do to whatever objects have been labeled.

For other languages, this grammar may have to be modified to suit the language’s grammatical structure, but for English, it will do the trick.

The Verb is any action you can imagine a character performing. Pick up, drop, throw, etc.

Pick Up
Noun is an object or point of interest in the room. For example, a lamp, a ball, and a “you”:

The expression space for the Fox is the cartesian product of every Verb (action) it can do on every Noun (object or point of interest). So we take the grammar from before and we fill it out with the actions and objects the Fox has access to at a particular point in time:

Now, we use the model. As before, we provide the input, give it all the possible candidates the Fox can respond with, the expression space generated by the grammar, and tell it which mode to use. In this case, we want Input/Response. The model then gives me a ranked list of the responses. The top 8 are shown here:

Relationship Between the Fox AI and ML Model

The Fox uses a Utility AI system, a very common approach in traditional game AI. In Utility AI, a number of modular actions are implemented and registered with the Utility AI manager. At a certain cadence, it asks each modular action how useful it is. Each action returns a number between 0.0 and 1.0, where 0.0 means “not useful” and 1.0 means “extremely useful”, based on the current situation. Utility AI then sets the current action to the action that returns the highest number. That’s the approach in a nutshell:

For the Fox, the new additional piece is that the modular actions can ask whether they map to the Verb in the top-ranked response, based on the rankings produced by the ML model for the last input. They can return a utility number with that ranking in mind. So for example, if the #1 ranked response was, “I look R ball”, the action labeled with the English language “look at” can report that it is extremely useful by returning a number close to 1.0 to the Utility AI manager:

This means that the model’s suggestions can be considered as part of the AI’s set of tools for deciding what to do, but the AI has the option to ignore the results from the ML model. The AI actions always consult the game state before doing something. For example, if the Fox is already holding something in its mouth, the game won’t let the Fox pick up something else, regardless of how high the ML model ranks picking up another object.

The developer would thus retain complete control over the AI. If the ML model suggests a response that’s not possible to perform, the AI could go down the list to the next best ranked response, or ignore the suggestions from the ML model altogether. If a character is about to be in a cutscene, the Utility AI decision-making can be turned off entirely.

Applying Rules to Behavior

There is also a way you can bias the model’s outcome, as a post-process step. The Reranker tool lets you take the rankings the model provided and boost the scores for certain Fox actions. This can be done completely in plain English:

This allows the game developer to create a set of rules that govern the behavior of a character, without having to change any underlying action or AI implementation. All the developer would do is apply a set of these rules.

So, let’s say you have a Fox who is sad. When the player says hi, maybe instead of waving, you’d like the Fox to do something sad. If the player asks, “do you want to play?”, it says, “no.” “Can I have some coffee?” Very unlikely to offer the player some. Instead of throwing the stick back to the player, like it usually does, it’s gonna be a total bummer and put it somewhere else. The cool thing here is the game developer doesn’t have to be overly precise with these. If the player says “hello” instead of “hi," it will still work.

Because these rules are just text, they can be set up to trigger at runtime, or even generated at runtime, based on player interactions with the character, for example.

Authoring Ambient Behaviors in Plain Language

Another time-consuming task with character AI is authoring ambient or patrol behaviors. This is a task that typically requires knowing a scripting language or how to use a visual scripting system in traditional game development.

A designer may give the implementer a list of things a character should do and then the person implementing will translate these plain language instructions into code the character AI would be able to carry out.

What if instead, you could simply use the plain language instructions and the ML model’s Semantic Similarity mode to carry out these instructions?

As you can see in the recording above, you don’t even have to be overly precise with the instructions. The Fox chose to look at the couch, when the instruction says look at the sofa. Semantic ML lets the Fox make the connection that the couch and sofa are the same object.
The above instructions are pretty close to the Fox’s expression space. But would it still do something sensible if the instructions were bizarre (or had typos)?
Of note here is that during the recording, a typo was made: “make some Monet” instead of “make some money”. But, because Semantic ML is also aware of cultural connections, that Monet was a painter, the Fox conjured a painting.

This is one of the coolest aspects of this technology, because it gives game characters an inner life that wasn’t possible before without a ton of work from game developers to anticipate every possible player idea.

Using Plain Language to Experiment

As games are created, they change a lot moment to moment. Features are added and removed and the game developer needs the game systems to adjust as quickly as possible to the new setup. With this approach to using ML, where the game developer doesn’t need to train anything, the model can adjust as quickly as any other part of the game engine to new changes in features.

The example below is an early prototype of the Fox AI, written entirely as text. It uses Google Sheets and the Semantic Reactor, a sandbox to experiment with the Semantic ML model before trying it in your game engine.

You can see the Grammar in column C and the actions and points of interest that the character has available in the columns to the right. If you press “Generate Expression Space!” a script will generate the sentences that make up the character’s expression space and put them in column A, which is what the Semantic Reactor uses for its candidates.

Then we can start "interacting with the character", so to speak, by typing inputs into the Semantic Reactor text box:

Let’s walk through a simple example one might have during game development. You’re experimenting with what your character does in response to things you try. You ask it to “do something amazing”. In response, it reacts by cheering. As you’re testing this, a VFX artist on your team runs over and tells you there’s a way to get your character to create fireworks! “That’s way more amazing than just cheering,” you think. To get the ML model to work with this new feature, you just need to add it to the set of actions and regenerate the expression space:

Maybe you submit your changes and the graphics programmer runs over and says that the latest change has completely tanked the frame rate and there’s a critical demo they have to lock the build for! Can you take the fireworks feature out right away? No problem, you remove the action and regenerate the expression space, and voila, you’re back to where you were before:

In contrast to the Google Sheets prototype, the Fox’s expression space is regenerated every time a new input comes through, because the game state may be changing constantly. That means in practice, you would simply remove the action for “create fireworks” from the character, just as you would in traditional development. Because the expression space that is passed into the ML model is generated from the actions the Fox currently has, everything will adjust automatically for the ML model.

Further Reading and Playing

The Fox is only a small sample of what is possible with Semantic ML. There’s so much more you can do using this technology, beyond even character behavior, like choreographing content: music, weather and lighting, scene generation, dialog — anything that could be tagged with natural language.

The best way to dive into using this technology is to just try it yourself! With Semantic Reactor, you can see what Semantic ML can do, without the need to get the model running in your game engine. You can start experimenting right in Google Sheets with your own ideas for games, in text form.
You can also try a small game that uses this technology for character dialog, the Mystery of the Three Bots!

To inquire about the Semantic ML tools coming to Stadia or to pitch a game based on some of your Semantic ML experiments, sign up to be a Stadia developer here. Here are some tips about making a great game pitch.

We'd love to hear your feedback. Please take this short survey to help us improve on future posts.

--Anna Kipnis, Senior Interaction Designer, Stadia R&D

Become a Stadia developer image

Become a Stadia developer

Apply Now

This site uses cookies to analyze traffic and for ads measurement purposes. Learn more about how we use cookies.