A Rundown of Cloud Game Infrastructure
MUHAMMAD AATIF BASHIR Choudhary
Founder, MD & Chief Executive Officer
This solution provides an overview of common components and design patterns used to host game infrastructure on cloud platforms.
Video games have evolved over the last several decades into a thriving entertainment business. With broadband internet becoming widespread, one of the key factors in the growth of games has been online play.
Online play comes in several forms, such as session-based multiplayer matches, massively multiplayer virtual worlds, and intertwined single-player experiences.
In the past, games using a client-server model required the purchase and maintenance of dedicated on-premises or co-located servers to run the online infrastructure, something only large studios and publishers could afford. In addition, extensive projections and capacity planning were required to meet customer demand without overspending on fixed hardware. With today’s cloud-based compute resources, game developers and publishers of any size can request and receive any resources on demand, avoiding costly up-front monetary outlays and the dangers of over or under-provisioning hardware.
High-level components
The following diagram illustrates the online portion of a gaming architecture.
The frontend components of the gaming architecture include:
The backend components of the gaming architecture include:
These components can be hosted on a variety of environments: on-premises, private or public cloud, or even a fully managed solution. As long as the system meets your latency requirements for communication between the components and end users, any of these can work.
Frontend
The front end provides interfaces that clients can interact with, either directly or through a load-balancing layer.
For example, in a session-based first-person shooter, the front end typically includes a matchmaking service like Open Match. This service distributes connection information for dedicated game server instances to clients:
Frontend services don’t have to be used exclusively by external clients. It is common for frontend services to communicate with each other and with the backend.
Since frontend services are available over the internet, however, they can have additional exposure to attacks. You should consider hardening your frontend service against denial-of-service attacks and malformed packets to help address these security and reliability concerns. In comparison, backend services are generally only accessible to trusted code, and might therefore be harder to attack.
Game platform?services
Common names for this component are platform services or online services. Platform services provide interfaces for essential meta-game functions, such as allowing players to join the same dedicated game server instance, or holding the “friend list” social graph for your game. It’s common for the platform your game is running on, such as Steam, Xbox Live, or Google Play Games, to provide these services:
Game platform services evolved in a similar way compared to web services:
Backend platform?services
Although most platform services are accessed by external clients, sometimes it makes sense for a platform service to be accessed only by other portions of your online infrastructure, such as an internal competitive player ranking service that isn’t exposed to the public internet. Although such backend platform services typically lack an external network route and IP address, they follow the same design practices as frontend platform services.
Google Cloud game platform service resources
The following resources provide more information about how to build platform services on Google Cloud.
Dedicated game?server
Dedicated game servers provide the game logic. To minimize latency perceived by the user, client game apps typically communicate directly with the dedicated game servers. This makes them part of the frontend service architecture.
Note: As frontend services, game servers can be targets for many kinds of exploits and attacks. Detection of bad actors and cheating players is an important topic but is outside the scope of this article.
The industry doesn’t have standard terminology, so for the purposes of this article:
Types of dedicated game?servers
The term dedicated can be misleading for today’s backend game servers. In its original context, dedicated referred to game servers that ran on dedicated hardware in a 1:1 ratio. Today, most game publishers manage multiple game-server processes running concurrently on a machine. Despite the fact that these processes now rarely have entire machines dedicated to them, the term dedicated game server is still in frequent use in the gaming industry.
Dedicated game servers are as varied as the types of games they run. A few high-level game server categories are discussed in the following section.
Real-time simulations
Until recently, almost every dedicated game server for a commercially shipped product was part of the front end for a real-time simulation game. Real-time simulation game servers have historically pushed the limits of vertical scaling. The most demanding games have moved to manual-horizontal scaling tactics such as running multiple server processes per machine or geographically sharding the world. UDP communication with custom flow control, reliability, and congestion avoidance is the dominant networking paradigm.
Most real-time simulation game servers are implemented as an endless loop, where the goal is to finish the loop within a given time budget. Typical time budgets are 16 or 33 milliseconds which yields a 60 or 30 times-per-second state update rate, respectively. The update rate is also referred to as the frame rate or tick rate. Although the server is updating its simulation at a high frequency, it is not uncommon for the server to communicate state updates to clients only after multiple updates have passed. This keeps the network bandwidth requirements reasonable. The effects of updating less frequently can be mitigated using strategies such as lag compensation, interpolation, and extrapolation.
All of this means that real-time simulation game servers run latency-sensitive, compute- and bandwidth-intensive workloads requiring careful consideration of the game server design and the compute platforms it runs on. Google Cloud founded the Agones open-source project to help simplify running dedicated game servers on Kubernetes clusters such as Google Kubernetes Engine (GKE).
Session- or match-based games
Games, where the servers are designed to run discrete sessions, are very common today. Typical examples are the multiplayer sessions of first-person shooter (FPS) games such as Call of Duty?, Fortnite?, or Titanfall? or multiplayer online battle arena (MOBA) games such as Dota 2? or League of Legends?. These games have servers that require twitch-fast gameplay and detailed game-state calculations, frequently with threads devoted to AI or physics simulation.
Massively multiplayer persistent worlds
Almost two decades ago, Ultima Online? paved the way for a huge explosion of massively multiplayer online (MMO) games. Today’s most popular MMOs, such as World of Warcraft? and Final Fantaxy XIV?, are characterized by complicated server designs with an ever-evolving set of features.
Complex issues are common in MMO game servers, such as passing game entities between server instances, sharding or phasing the game world, and physically co-locating the instances simulating adjacent game world areas. The compute and memory requirements to calculate state updates for a persistent world containing hundreds or thousands of players can lead to solutions such as the time dilation of Eve Online?.
Request/response-based servers
领英推荐
Technically, all dedicated game servers are based on a series of requests and responses. In particular, however, mobile game servers, without a critical demand for real-time communication, have adopted HTTP request and response semantics like those used in web hosting.
The challenges for request/response game servers are the same as those for any web service, including:
The strengths of request/response game servers, such as compact communication semantics and ease of retries after an application or network failure, work well for turn-based and mobile games. We recommend that servers of this type use a serverless API on a platform such as App Engine or Cloud Run.
Externalizing the game world?state
Increasingly, players expect zero game downtime. This means you need to protect their experience from issues affecting individual server instances. To help do so, a game should persist the player state outside of a single game-server process. The advantages are many, such as resilience against crashed server processes and the ability to effectively load-balance.
Unfortunately, simply using externalized state patterns popular in web services can be problematic for a number of reasons, including:
However, solving these problems has several beneficial side effects. Successfully externalized state available to many processes with proper access management in place can greatly simplify the ability to calculate portions of the game state update in parallel. It is similarly advantageous for migrating entities between instances.
Google Cloud dedicated game server resources
The following articles describe how to run dedicated game servers on Google Cloud.
Backend
Backend services present interfaces only to other frontend and backend services. External clients can’t directly communicate with a backend service. Backend services typically provide a way for you to store and access data, such as game state data in a database, or logging and analytics events in a data warehouse.
Game Database
Among the scenarios that can cause players to quit playing your game and never return are non-working servers and the loss of player progress. Unfortunately, both are possible if you have a poorly designed database layer.
The database that holds the game-world state and player progression data could be considered the most critical piece of your game’s infrastructure.
You should evaluate the ability of the database to handle not only your expected workload but also the workload required if your game becomes a massive success. A backend designed and tested for an estimated player base, but which suddenly receives an order of magnitude more load is unlikely to be able to serve anyone reliably. Failure to plan for unexpected success can cause your game to fail, as players may abandon your game when it becomes unplayable due to database issues.
Games are particularly vulnerable to this issue. Most businesses with a successful product can expect gradual, organic growth. But a typical game will see a large spike of initial interest followed by a quick fall-off to a much lower amount of usage. If your game is a hit, an overtaxed database may have massive delays before saving user progress, or even fail to save the progress altogether. Being in a situation where you’re forced to decide which features of your game are no longer going to support real-time updates is not a situation any game developer wants to be in, so plan your database resources carefully.
When designing a game database:
Relational databases
Many game development teams begin with a single relational database. When the data and traffic grow to the point where the database performance is no longer acceptable, a common first approach is to scale the database. Once scaling is no longer feasible, many developers implement a custom database service layer. In this layer, you can prioritize queries and cache results, both of which limit database access. By adding scaling and a database service layer you can produce a game backend that can handle huge numbers of players, but these methods can have some common issues:
Google offers Cloud Spanner, which is a managed relational database that can help you to mitigate these issues. Spanner is a scalable, enterprise-grade, globally distributed, and strongly consistent database service that is built for the cloud. It combines the benefits of a relational database structure with a non-relational horizontal scale. Many game companies find Spanner to be well-suited to replace both game state and authentication databases in production-scale systems. You can scale for additional performance or storage by using the Google Cloud console to add nodes. Spanner can transparently handle global replication with strong consistency so that you don’t have to manage regional replicas. For more information, see Best practices for using Spanner as a gaming database.
NoSQL databases
Non-relational databases can provide the solution to operating at scale, especially with write-heavy workloads. However, they require that you understand NoSQL data models, access patterns, and transactional guarantees.
There are many types of NoSQL databases, and those well-suited for storing game world states have the following features:
Google Cloud game database resources
Analytics
Analytics has grown into an important component of modern games. Both online services and game clients can send analytics and telemetry events to a common collection point, where the events are stored in a database. They can then be queried by everyone from gameplay programmers and designers to business intelligence analysts and customer service representatives. As the complexity of the analytics that is being collected grows, so does the need to keep these events in a format that can be easily and quickly queried.
The last decade has seen a massive rise in the popularity of Apache? Hadoop?, the open-source framework based on published work from Google. The expansion of the Hadoop ecosystem has increased the use of complex batch extract, transform, and load (ETL) operations to format and insert analytics events into a data warehouse. The use of MapReduce sped up the rate at which actionable results were delivered, and this speed in turn helped enable new, more compute-intensive analytics.
Meanwhile, the technologies available in the cloud have continued to evolve. Many of them are available as managed services that are quick to learn and require no dedicated operations staff. Google’s latest streaming ETL paradigm provides a unified approach to both batch and stream processing and is available both as a managed cloud service and as the open-source project Apache Beam. Continued improvements in cloud data storage prices now make it possible to keep huge amounts of logs and analytics events in massive, managed, cloud databases that optimize the way that data is written and read. The latest query engines for these databases are capable of aggregating TB of data in seconds. For an example of this, see analyzing 50 billion Wikipedia pageviews in 5 seconds.
We recommend that you format your analytics for the future. When you decide which events and metrics your game writes to your analytics backend, consider what format is easiest for you to data mine for insights. Although you can use ETL to copy the data your app writes into a format that works well for analytics queries, it can take time and money to do so. Investing in the design of your analytics output format can lead to significant cost savings and the possibility of real-time analytics insights.
Use batch processing for existing?formats
If you want to analyze metrics data that’s in an output format that you don’t control (for example, data from a third-party integration or service), we recommend that you start by saving the metrics data to Cloud Storage. If the data format is supported, you can query it directly from the BigQuery interface using BigQuery federated queries. If the data format isn’t supported, you can use ETL to copy the data from Cloud Storage using Dataflow or other tools, and then store the resulting formatted data in BigQuery alongside data from other sources. We recommend that you set up a regular batch job to save costs instead of streaming unless you have an urgent need for the data in real time. For more information about this approach, see Optimizing large-scale ingestion of analytics events and logs.
Predict churn and spending with proven?models
You might already be using Firebase for your mobile game for one of its many other features like remote config, in-app messaging, or Firestore client libraries. Firebase also offers built-in churn and spend prediction machine learning (ML) models. You can integrate Remote Config personalization to apply ML to your analytics data, which can create dynamic user segments based on your users’ predicted behavior. This data can be used to trigger other Firebase features, or exported to BigQuery for more flexibility. Firebase also offers Unity and C++ clients, and its use isn’t limited to mobile games.
Normalize data for AutoML Tables custom-model training
Generating an effective ML model typically requires extensive ML expertise to select relevant features and tune hyperparameters. However, following data preparation guidelines improves the ability of the latest automated tools to perform these tasks for you and generate a useful model on your behalf. After a model is generated, it can be hosted on Google Cloud to do online or batch predictions — for example, predicting if a player will make a purchase in the game, or if they will quit playing.
Although analytics events and player data are useful for traditional analytics queries and business intelligence metrics, you need a different format to train an ML model. A common use case for ML in games is to make a custom model to predict when players will first spend money in the game. AutoML Tables can help to simplify the training process. For more information about AutoML Tables, see Preparing your training data and Best practices for creating training data.
Multiple game studios and publishers have achieved results by using a daily-rollup format as the basis for training. A daily rollup is a normalized row format that has one field for each significant analytics event. The row format contains a cumulative count of the number of times that the player triggered the event to date. This row provides a daily snapshot of all the potentially important events that a player triggered to date, along with a true or false has made a purchase flag.
The process described in the AutoML Tables quickstart documentation can result in high-quality models when training with data formatted in this way. You can then give the model a daily rollup row and provide predictions of how likely it is that the player will make a purchase. You can also use similar approaches to formatting data alongside different flags to train models to make different predictions, including churn or other player behaviors. Making an up-front investment in building normalized data formats can help you rapidly try out models to predict any player action that you want. This modeling can potentially help you monetize your game or prioritize features that result in desirable player outcomes.