Architecting for user (and engineer) happiness during Stadia Controller setup
In this new series of blog posts, we'll be detailing some of the more technically intricate aspects of working on a platform like Stadia and how we've worked to achieve simplicity in a world of complexities for our end users. In this first set of posts we're tackling some challenges specific to Android and iOS, with more to come in the future.
And now, here’s part two in our series on building Stadia Controller setup in our mobile app with Flutter.
We made architecture decisions that paid off
Setting up the Stadia Controller has all of the same considerations, plus several more. In addition to events that can come from the UI, controller setup has events such as:
- Messages or status updates received from the controller
- Bluetooth events, like the controller connecting or disconnecting
- OS-level events, like the Bluetooth antenna turning off unexpectedly or the user losing connectivity
- Synchronous network calls, like checking if security requirements are satisfied, or if software updates are available for the controller
Workflow and Hierarchical State Machine (HSM)To manage all these event sources, we extended the Stadia app’s architecture by combining it with a Hierarchical State Machine (HSM). These came together in a single Flutter StatefulWidget that we called a Workflow. This is the backbone of the controller setup flow.
We first broke down the flow into a set of states. Each state represents a position in the flow, and it may or may not be associated with UI on screen. Using a HSM affords us the ability to nest states within one another, and we used this to have superstates for large chunks of the flow.
The Workflow allows us to tie user-visible UI changes to the HSM. It maintains a map between HSM states, navigation routes, and UI screens. During every state transition, the Workflow’s route table is checked. If navigation to a new screen is required, the new screen is rendered with the appropriate animation, such as a forward or backward transition. It’s important to emphasize that a state transition inside the HSM triggers a UI transition, not the other way around.
When the state machine first enters the superstate, the state machine checks to see whether it has an initial child state. It does, and it’s the Wi-Fi state, which then becomes active.
Inside the Wi-Fi state, a similar check for child states occurs. The Wi-Fi state follows a pattern that the team has found useful, where a state that might need to do something will have two children, an initial ‘checking’ state and a main state. To briefly explain, we adopted this pattern because it allowed for better separation of concerns, and decluttered the branching logic in our state machine. The WiFi state contains these checking and main child states, and because the checking state is the initial one, it becomes active. When this state transition happens, the Workflow checks the route table to determine whether anything needs to be added to the navigation stack. The Wi-Fi checking state has no associated UI, so there is no user-visible change.
Upon entering the Wi-Fi checking state, a piece of business logic invokes our connectivity plugin to determine whether the device is connected to a Wi-Fi network. Depending on the outcome, this business logic will fire one of two events. When either of these events is received by the state machine, it inspects the Wi-Fi checking state’s event handlers to see whether it handles the event. The checking state handles the Wi-Fi disconnected event, so if that’s what was fired, the state machine will execute the handler, transitioning to the main Wi-Fi state. During the transition, the Workflow’s route table is referenced, and the UI navigates to the appropriate screen.
The Location Access state follows the same checking / main state pattern as the Wi-Fi state. Upon entering the location checking state, some business logic is called to check the status of the location access permission. Again, two possible events can be fired. If the user has already granted the permission, the checking state’s handler for that event causes the state machine to transition to the Bluetooth state. If the user does need to grant it, a different event is fired and the location access permission screen is pushed onto the navigation stack.
This screen explains why the permission is needed. When the ‘Next’ button is tapped, the permission dialog is presented. If the user grants the permission, the app is notified in the same manner as in the checking state, an event is triggered, and the state machine transitions to the Bluetooth state. If the user doesn’t grant the permission, the screen updates to explain that they need to grant permission in their device’s system settings to proceed.
On this version of the screen, if the ‘Next’ button is tapped, the app opens an appropriate OS-level settings page. If the user updates the permission and goes back to the Stadia app, they are able to proceed to the Bluetooth state.
Consider a counterexample
Imagine if the setup flow did not use a HSM or an event-driven design. The implementation becomes more complicated from the very first line of code. Some of the first logic in the setup flow determines what permission screen to initially show to the user. In our real implementation, this is in a non-UI checking state. Without a state machine, there isn’t an easy way of representing a state without associated UI, and deciding where this logic should live becomes surprisingly difficult. The logic might hang off of the button handler that launches the setup flow, could exist inside of a new Welcome screen, or could be called as the “Connect to Wi-Fi” page is presented to the user. These all have downsides, ranging from leaking setup flow logic into the rest of the app, to presenting extraneous information to the user. (Recall the “masking complexity” concept from the first post.)
Similarly, the logic for traversing the flow might be decentralized, with each screen handling navigation on its own. This approach would make it more difficult for someone not familiar with the flow to get a complete picture of how the screens fit together (contrast this with the HSM, where connections between states are centrally, explicitly declared). This increases the complexity of every screen’s business logic, adds to the flow's maintenance cost, and multiplies the surface over which bugs can be introduced.
With the navigation logic spread out, moving screens around becomes much more challenging as well. Swapping two screens in the non-HSM flow means touching the two screens in question, plus every screen that can navigate to or from those screens, whereas with a HSM, in many cases, the swap can occur without touching the business logic of any screen.
Lastly, without a HSM, it can become harder to gracefully handle interruptions. In the setup flow there are multiple background services running at all times, and occasionally one may need to stop user progression, or otherwise show a message to the user. In many cases this can mean that other tasks need to stop running. Without the ability to fire an event when this happens, some logic would maintain references to all the services that need to stop for every interruption, and declare how each of them react. There is almost certainly some overlap between interruptions, but the need to explicitly declare each of these means that there’s a high risk of introducing bugs. When using a state machine instead, these reactions can be executed when states are entered and exited, which means that they can be handled much more consistently, and with less explicit declaration for each scenario.
Further benefits of the HSM approachAs a coda to the architecture discussion, it’s worth noting that our use of a HSM and the Workflow pattern is extremely testable and debuggable.
All of our event sources are piped through the HSM, and all handling of those events happens inside the HSM. Testing the sending side of the event stream involves making sure that the correct event is fired at the end of whatever logic is running. When testing on the receiving side, it’s trivial to inject events to simulate them being sent from elsewhere.
Additionally, there are a finite number of events, and each state explicitly declares what events it handles, meaning states have known inputs and outputs. All of this isolates potential problems, and makes fixing bugs often a matter of determining what input or output event is incorrect, and addressing the issue at the source.
Lastly, using the HSM as an extension of our existing architecture meant that multiple team members would all naturally write and structure their code in a standardized way that’s similar to that of the rest of the app. One specific benefit of this fact is that the setup flow’s UI uses our app-wide patterns. Because of this, it’s much easier for engineers to jump in and help on any part of the setup flow, and it means that the flow can benefit more from app-wide refactors and improvements.
There’s much more to say about the Stadia mobile app’s architecture, but we’ll pause here for the time being. Using a HSM and the Workflow pattern have given the team a great set of tools to work with when building and improving the flow and have cleared up the mystique within the team around working with hardware. In the final post in this series, we’ll discuss how our use of Flutter greatly boosted our development speed, and some of the ways the team improved the reliability and approachability of the setup flow.