In September 2020, Psyonix remodelled its multiplayer smash hit Rocket League in a big way, pivoting from a premium game into a free-to-play, live service title. The update saw a huge uptick in new and returning players, and caused the game to pass over a million concurrent players for the first time ever.
Matthew Sanders, the company's lead online services engineer, explained the main steps that Psyonix followed in order to prepare for the switch. In his GDC talk 'Rocket League: Scaling for free-to-play', he discussed the timeline that the studio followed to carry out the transition, from scaling to accommodate more players, forging new partnerships to carry new software burdens, and the eventual relaunch.
"The plan was for Rocket League to drop its price tag. This lower barrier to entry was expected to bring in lots of players that didn't want to or couldn't pay the sticker price," Sanders explained.
"Our analysis tasks revolved mostly around what to expect. We spent some time with load projections to figure out what player population levels we might see and we also spent time analysing our architecture to set up a backlog of potential issues."
Sanders moved on to explain load testing, which is essentially testing the capabilities of the backend by simulating the game's expected usage after its relaunch. Writing load tests was a unique task that required its own code; Sanders said that the team got almost zero reuse from their existing code base. He also said it was expensive, in terms of both engineering hours used and the power needed to run the load test environment.
"This environment contained our full set of services and was necessarily scaled up beyond our production environment in order to handle our times five load projection," Sanders explained. "Lastly, for the load tests, they were hard to get right because it took both research and iteration to get the client requests simulated realistically. We wanted to get the request patterns to match, which involves calling them in the right sequence and with the right timing."
While a simulated version of traffic cannot be completely accurate, Sanders said that the research from the evaluation helped the team figure out which features of Rocket League have the highest scaling risks, meaning the team could focus on them as a priority.
Sanders also added that a good way to predict what the player increase may look like is to compare it to another similar free-to-play title that's already on the market. Unfortunately, Psyonix struggled to do that with its -- as Sanders put it -- "multi-console sort-of-sport, sort-of-action game, with rockets."
The team settled on an increase of three to five times its scale at the time, and used the upper end of that projection as its load testing target.
"We designed and adjusted the best we could, we couldn't know any more until the actual launch," he said. "Still, these tests were definitely guiding us to many improvements. We ran into probably dozens of scaling issues, and had to fix each one to scale up higher to uncover the next issue. To do this under pressure of live traffic would have been disastrous."
A rise in players comes with an increased need for technical and live support. With that in mind, Psyonix reached out to several companies that could provide external support in multiple areas, including customer service and technical issues.
"It's always better to know the limits of your services. Even if it's too late to make improvements, you can have mitigation ready instead"
"We don't have a huge engineering staff so we wanted to augment with dedicated experts wherever possible," Sanders explained. "The primary intent of signing support contracts was to be able to summon expert specialists at a moment's notice. We quickly noticed that having all specialists on call like this is actually pretty expensive. But our thought process was if the experts helped reduce our downtime at all, it'd quickly pay for itself by restoring our revenue stream sooner."
Given how much effort had gone into developing and running load tests for the new Rocket League, the engineering team opted to keep running them up until the release.
"It's always better to know the limits of your services. Even if it's too late to make improvements, you can have mitigation ready instead," Sanders said. "Looking even beyond the release, we realise that load testing can and should be more fully integrated into our SDLC systems development life cycle]. Once running at a higher scale, we can expect all of our services to be more sensitive to performance issues. Changes will be more likely to have unexpected impacts, and continued load testing is going to be the easiest way to maintain confidence in the performance of our services."
Once the planning and preparation was completed and the team had an informed idea of what the game would need to accommodate, they set about creating the necessary improvements in time for Rocket League's relaunch. This included moving the game's core set of services from Google App Engine to Kubernetes, migrating servers between services, adding a rate limiting system to shield the game's servers from overwhelming traffic numbers, and a complete overhaul of the match-making system.
"There were several reasons for us to move to GKE [Kubernetes]," Sanders said. "We wanted to increase control over our PHP runtime, since we were missing out on some significant performance improvements, among other things. We also needed more control over our resource scaling. Another goal was more deployment consistency by running more components of our architecture in Kubernetes.
"Completing this migration to Kubernetes was a fundamental part of getting our core services modernised and tuned for a higher free-to-play scale."
The second major update Sanders discussed is overhauling the game's matchmaking system, as the previous build has some "insurmountable performance problems." The team needed to find a way to improve how much the system can take while preserving the original functionality.
After much deliberation, the team partnered with Open Match, an open source game matchmaking framework. "Our matchmaking service was a single threaded dotnet application, pretty close to being maxed out and definitely not going to hold up anywhere near our load testing target," Sanders said.
"It seemed like it would let us skip some framework development, but still allow us the freedom to implement all of our functional requirements. It's also open source in contrast with our matchmaking service, open match was designed for scaling, which is where the MapReduce algorithm comes in.
"Overhauling our entire matchmaking system was another huge undertaking lasting more than a year, but it was desperately needed and directly contributed to the success of our free-to-play release"
"Having load tests that incorporated the matchmaking path were essential. We're able to tune Open Match in GKE for running at higher load levels, just like with our core services. More importantly, we learned the dynamics of how even subtle configuration changes affect the matchmaking system as a whole.
"This allowed us to react more quickly and correctly during our free-to-play launch, as well as know where to go with further development. Overhauling our entire matchmaking system was another huge undertaking lasting more than a year, but it was desperately needed and directly contributed to the success of our free-to-play release."
Sanders went on to explain the importance of setting intermediate deadlines throughout the duration of a project. A far away launch day should not be the only deadline that your team is aiming for. He highlights the importance of treating these milestones seriously, making sure the team sees them as legitimate deadlines, and calling out risks if the team is falling behind.
"Most of our major features actually slipped far past our intermediate deadlines," Sanders said. "We were almost in a situation where our major features were all trying to roll out at the same time, right before the free-to-play release. That was incredibly high risk.
"From our post mortem, we realised that our feature deadlines, slippage was also partially the result of not properly coordinating the teams. As I mentioned earlier, some of our features were already underway and simply folded in as prerequisites to our free play release. This weak association allowed those features deadlines to slip without considering the wider free-to-play release schedule."
Releasing the new game build a week before the switch to free-to-play was an important part of the relaunch, according to Sanders. He noted that Rocket League generally sees a significant upswing of users after every new update, and launching it a week early allowed that user bump to subside prior to going free-to-play.
The free-to-play version launched on a Wednesday, which is generally a lot less busy than weekends. This gave the team a few days to react to any issues as the player population rose in the lead up to the weekend. Sanders also said that the game exceeded one million concurrent players for the first time that week.
He also highlighted a few instances of outages between the Wednesday of the launch and the following Friday, but said that the team was proactive enough to have everything in order so that the game did not experience outages over the first free-to-play weekend.
"Amazingly, our time five projection turned out to be extremely accurate," he said. "The entire team joined the war room Zoom call every day up through this first weekend. We were more prepared than any other release prior, but it was still tense to monitor. It was so difficult to try to load test everything we've ever written.
"Lastly, it's worth saying that just because there were no other big outages doesn't mean that we weren't actively identifying and fixing issues before they became problems. We spun up a simple spreadsheet for lightweight tracking of potential issues. It ended up with over 50 entries. Many of these were fixed quickly and the rest were put into the backlog to fix once things calmed down."
"From the planning phase, seek expert advice and support, stay organised and get working on the long tent poles first, but beware over planning set a deadline for it," Sanders said.
He also stressed the importance of the initial load testing prior to Rocket League's relaunch, and how it helped the team keep live services stable during the game's biggest update yet.
"It was far more work than we expected," he added. "But it was unquestionably worth the investment for the free-to-play release and beyond.
"Finally, you should always have some versatile loadshedding controls like rate limiting; they're low effort and will get a lot of reuse. Even if your load testing demonstrates you can handle your load target all day long. But if you actually get ten times or 100 times that target, rate limiting might just keep your services stable rather than have them go down entirely."