Why is hardware setup in the cloud hard? And how Stadia engineers solved it.
In this new series of blog posts, we'll be detailing some of the more technically intricate aspects of working on a platform like Stadia and how we've worked to achieve simplicity in a world of complexities for our end users. In this first set of posts we're tackling some challenges specific to Android and iOS, with more to come in the future.
Part one in our series on building Stadia Controller setup in our mobile app with Flutter.
Hardware setup is an under-appreciated challenge in consumer electronics. It’s simultaneously one of the user’s first experiences with the product and a very difficult feat of engineering, which leads to a careful balancing act of competing priorities. The setup process for the Stadia Controller is a real example of this balancing act. The controller is one of the key things that makes using Stadia a great experience, so getting users ready to play as smoothly as possible is a must. Once a controller is out of the box, getting it set up takes most people less than a minute, but doing so involves a lot more under the hood than one might think.
In this blog post series, I’ll discuss the challenges our team faced when implementing controller setup in the Stadia mobile app, the architecture decisions we made that improved the consistency and understandability of our code, and how using Flutter increased our productivity and the quality of the user experience.
|Stadia Controller next to the Stadia mobile app|
Hardware setup can be a deceptively difficult task
At first glance, setting up the Stadia Controller might appear to be a relatively straightforward process. The user needs to turn the controller on, use the Stadia mobile app to find and connect to the controller via Bluetooth, send it Wi-Fi credentials, and then start playing.
Based on that description alone, it seems like everybody should be able to set up their controller successfully every time. Unfortunately, this kind of universal success is only possible under ideal conditions. Someone setting up a brand new, fully charged Stadia Controller, using a specific phone and Wi-Fi router, who has set up many controllers before, and who never mistypes their Wi-Fi password, inside a Faraday cage, would probably have an excellent experience.
However, there are thousands of different phones and Wi-Fi routers on the market, all with their own quirks. (And in fact, we try to support as many of each as possible!) Similarly, the setup process is subject to environmental and human factors – unexpected things can happen and setup may not always go as planned.
To bridge the gap between ideal conditions and real-world conditions, we can use the concept of "user success". For a user to succeed during the setup process, they must end up with a correctly configured controller, know how to use it, and understand what’s happening at every step along the way.
Of course, everyone wants user success to be as high as possible and the experience to be best-in-class. However, the total available effort to implement the setup flow is constrained in two notable ways.
First, there are a finite number of people working on the project, and they can only do so many things at once. Second, there is a fixed amount of time to complete the project. At some point, controllers will start rolling off the production line and into users’ hands. When users get their controllers, they need to be able to set them up.
These factors lead to tradeoffs between different aspects of the setup flow.
One way of thinking about these aspects is grouping the tasks necessary to write the flow into three pillars: correctness, reliability and approachability.
Effort needs to be put into each of these aspects, but the team can’t solely invest in one or two. If any one aspect is lacking, the setup flow will feel incomplete or even broken, and user success will suffer. As each of these aspects is explained in more depth, it should become apparent that each needs to be carefully balanced with the other two.
Correctness is the idea that the setup flow can configure a device. On some level, this is table stakes. If it’s at all possible to make it through the flow and have properly functioning hardware at the end, even with a bunch of assumptions, the flow can be considered ‘correct’, at least to a degree. With this bare minimum, the flow will only work under very specific ideal conditions. While the person sitting in that Faraday cage mentioned earlier may be content with this state of affairs, other folks may not.
To make the setup flow functional for all of our potential users, the app needs to make as few assumptions as possible. Removing each assumption increases the surface over which the setup flow needs to be correct, increasing the complexity of the flow. Once the app stops assuming that the user only has a specific Wi-Fi configuration, the setup flow needs to be able to handle several configurations. Once the flow stops assuming that the app has already prompted the user to grant system permissions, we need to start checking for each permission and prompt the user when one is needed.
By a strict definition of correctness, the flow doesn’t need to be graceful or even particularly helpful. All that’s required is that the device can be configured correctly, with a given set of constraints and assumptions, and that the user has the opportunity to restart the process if something goes wrong.
Similarly, if a device can go through the setup flow and end up misconfigured or inoperable, the flow can hardly be called correct. This can be extremely frustrating, so identifying when misconfiguration might occur, and doing the minimum to prevent it, is part of that same definition of correctness.
An additional hurdle that’s worth noting is that the definition of “correct” for a hardware setup flow can be a moving target, especially when the hardware is being developed alongside the flow. For example, the setup protocol can be amended or expanded, or there can be differences between hardware revisions that require changes. Consequently, even when shooting for the bare minimum, reaching correctness can be difficult.
Reliability is making the setup flow resilient against technical failures, and correcting for them if possible. Rather than always restarting if there is an issue, the setup flow should attempt to fix it first, or should be designed to proactively avoid the issue in the first place. Reliability builds upon correctness and can be much more challenging.
There are a number of places where technical failures can arise. First, there can be issues with wireless communication. The Stadia app communicates over Bluetooth Low Energy (BLE) to the controller, and the controller connects over Wi-Fi to transmit inputs during gameplay. Both of these can be affected by noise (if someone lives in an apartment building with lots of Wi-Fi networks, for example) or signal strength, both of which might cause data loss during transmission. In other words, wireless communication can randomly fail.
Another source of pitfalls is the sheer number of combinations of device make, model and operating system version that we support. The Stadia app is available on both Android and iOS, with support for old versions of each, and on multiple device form factors. Consequently, there’s a big matrix of potential software differences that need to be accounted for. There are nuances in how BLE scanning works on each operating system (OS), for instance, and these need to be baked into the app to ensure compatibility. Devices running the same OS version can also have differing requirements. For example, a specific make of phone might require additional permissions to perform a BLE scan. For hardware setup to be successful with that type of phone, this needs to be surfaced to the user.
There are innumerable examples of failure points, and it can be difficult to wrap your head around everything that could possibly go wrong. Designing the setup flow to catch or prevent entire categories of failures can prevent things from spiraling out of control, but there are certain failures that must be caught explicitly. Doing so makes implementation and maintenance of the flow costlier, but it also makes the setup flow more robust, and will certainly boost user success.
Approachability takes all of the wrinkles introduced by the correctness and reliability pillars and shapes the flow into something that is pleasant and comprehensible. A correct, reliable setup flow may still be daunting to new users, and may be too burdensome for some to complete. Keep in mind, the setup flow is often a user’s first impression of the hardware, and one of their first impressions of the app, so it’s important to get right. Putting an emphasis on approachability means designing the setup flow with an eye toward masking complexity, and actively guiding users in the right direction.
You might already have an intuitive understanding of approachability if you’ve ever tried out a new recipe. The objective of the recipe is clear (make food), but the ease with which you reach that objective can depend a lot on how the instructions are layed out. Baking cookies can be frustrating if you are having to scramble from one task to the next. Conversely, it’s possible to prepare a multicourse meal without much stress if the process is explained well enough. In both of these examples, the order and quality of the recipe’s steps can make or break the cooking experience. If a recipe is too much of a hassle, you may not want to follow it again, even if the product is delicious. (Aside: for a good gluten-free cookie recipe, check out this one from Google Brain.)
This same principle applies to setting up a piece of hardware. The setup flow has a known goal, and has steps that must take place. The challenge of approachability is arranging those steps in the easiest way possible for the user.
As effort is applied towards this pillar, the setup flow’s error messages might be made more friendly and explanatory, and the flow might become more forgiving for common mistakes. Similarly, steps to help users troubleshoot issues might be added. And as feedback is gathered from users and UX research, the flow might be adjusted for speed or clarity.
Approachability can also go beyond hardware setup itself. Showing the user how to use their device toward the end of the setup process is important, because it’s a chance to anticipate and answer questions. Doing this can make users more confident and comfortable using the device once it’s set up. This was especially true for the Stadia Controller. It functions quite differently from other controllers, and connects to other hardware in several ways, (namely BLE, Wi-Fi and Linking Codes.) The team placed an emphasis on user education, but had to ensure that we did not overwhelm users in the process.
As the correctness and reliability of the setup flow increase, it will inevitably become more and more complex. A setup flow that starts simply
|Simplified flow diagram|
might look much more complicated after some of the engineering challenges have been solved.
|Complex flow diagram|
It is very easy for implementation complexity to manifest as cumbersome, often extraneous steps for users.
Masking the complexity in the correctness and reliability pillars is the key to maintaining the approachability pillar and ensuring sustained user success. This means showing the user only what is truly necessary, asking them only the most pertinent questions, and hiding things that might be confusing, unnecessary or duplicative. Often, this compounds the complexity of the first two pillars because it requires that developers assume more responsibility for the flow, and that must then again be masked to promote user success.
User success depends on finding the right balance between these pillars
This is why hardware setup is so difficult. There is a near-infinite amount of work that could be done to handle every possible failure condition, or to iterate on the user experience. Maximizing any particular pillar will not lead to a high level of user success, and in fact doing so might be actively detrimental to the others.
As stated previously, reaching universal success is not feasible. Instead, the goal is to get as close as possible to universal success with the time and effort available. The challenge in reaching this goal is how that effort is applied. Making judgement calls about the most valuable places to apply effort will allow the team to find the ideal path toward universal success.
Similarly, making smart choices about how these concepts, diagrams and considerations are translated into code, and how that code is organized, can greatly add or remove complexity from the setup flow.
Translating the screens themselves into code is only the first of many steps. Layered on top of that is navigating between the screens, and as previously discussed, the possible paths between screens in the flow can add up to an extremely complicated web. Next, the app must have some way of communicating with the controller. Now add the fact that the app must store the current progress in the setup flow, and make determinations about what path to take, based on what the user and controller are doing. Finally, consider all the edge conditions and possible errors or failures that can happen at any point in the flow, all of which require deviations from the ideal path.
Good design decisions can drastically reduce the amount of effort required to implement improvements to the setup flow. Over the course of the project, this directly results in more success for users.
When writing the setup flow for the Stadia Controller, the team made choices that allowed us to do exactly this. We were able to manage the complexity of the flow as it grew in scope, got massive productivity gains by using Flutter, and have invested these benefits into a better experience for our users. In the rest of this series, we’ll discuss all of these topics. Come back for the next post next week to learn about how our architecture decisions helped us to manage the setup flow’s complexity.
--Nick Sparks, Software Engineer