Minimizing downtime is life and death for video game studios. Here’s how they can scale while maintaining reliability.
By Rob Newell
Most software companies follow a similar cycle.?
They start small—a few engineers working on a single product or point solution. As they grow, they need to be able to keep track of what’s happening in their systems and address problems. The tech stack in these early days is fairly small and straightforward, so they build their own tools to monitor for anomalies.
As the company continues to grow, it eventually reaches a tipping point: its DIY tools are no longer able to keep pace with the rapidly expanding architecture. The engineers will have to invest in an external solution for monitoring or observability , otherwise, the threat of errors and downtime will seriously limit their ability to scale.
The video game industry, which reached an estimated market size of $217 billion in 2022 and is expected to exceed $500 billion by 2030, follows the same cycle as these other software companies. But for video game studios, the threat is even more pronounced; any issues with downtime or reliability will have an immediate impact on customer retention.
New Relic employees aren’t immune to the pain of video game downtime:
“After a hectic day, all I want to do is chill and dive into one of my favorite games,” says Aron Marden, principal solutions consultant at New Relic. “It’s frustrating when you’ve only got a small amount of time to fully immerse and enjoy the game, and you end up dealing with constant lag or crashes during the peak times when everyone in my region is online. As a player, I expect seamless and responsive gameplay, especially when it’s a game I’ve invested a lot of time and money in.”
A recent analysis of mobile app use found abysmal user retention rates for mobile games across iOS and Android. According to the data, game studios can expect just 3.8% of iOS users and 1.7% of Android users to be playing a mobile game after 30 days. These game designers are operating on a knife’s edge when it comes to uptime and reliability—they can’t afford to compromise on observability.
Why game studios struggle to scale
For game designers, the barrier to growth can arrive surprisingly quickly. Video game architectures tend to be highly fragmented, as designers and engineering teams piece together a variety of microservices, cloud-based tools, and serverless functions. Level designers and artists work together on a range of effects and animations, all of which add up to an intricate web of moving parts.
领英推荐
The end result is a sprawling technology estate with siloes that no engineer can stay on top of. At the most extreme end of the spectrum, some engineers will have no monitoring and alerts set up. When a game or a level goes down, the team has no visibility into what might be causing the problem. The end user doesn’t know and doesn’t care about the complexity on the other end of their experience—all they know is they can’t access the level they need, so they churn to a different game.
Some studios will struggle at the other end of the spectrum with too many alerts. Developers need to know when something is breaking, but they take a one-size-fits-all approach and end up with chaos—it’s impossible to determine what’s a real issue and what’s noise.
Many engineering teams, particularly those at small- and mid-sized game studios, lack the manual resources required to carefully calibrate their alerts. An all-in-one observability platform gives them the cheat code they need to establish baselines for their most important metrics and bring their alerts under control.
The value of observability for game studios
Observability isn’t just a matter of gaining visibility into a fragmented tech stack—it’s a tool to maximize the return on technology investments while maintaining the best possible user experience. An all-in-one observability platform allows gaming companies to gain full visibility into their user experience across every environment, revealing the connections between different microservices and providing a straightforward roadmap to detect and resolve problems.
Video game studios feel the need for observability most acutely when they introduce new features, levels, or games, leading to a spike in user load. DevOps and site reliability teams must monitor service latency to stay on top of shifting user demand, requiring massive logs of performance data and real-time analysis of performance. Observability can automate these processes, which in turn accelerates mean time to detection (MTTD) and mean time to resolution (MTTR) when an issue could cause increased service latency. Observability also helps engineers to improve database performance, which has a direct impact on the speed and performance of the game for the end user.
Between logs, automation, and alerts, an all-in-one observability platform makes it possible for engineers to slow down the situation and gain control of their tech stack. Over time, that control adds up to steady improvements in performance and reliability. In the highly competitive video game industry, those improvements can make the difference between success and failure.
What observability means for game companies and gamers
Over the long term, observability provides game developers with a much deeper understanding of their systems. As these companies increase their user base and add new features, observability provides them with valuable insights to help guide short-term decision-making and long-term planning.
For the gamers themselves, the impact of observability might be invisible. We notice when something breaks or is unavailable, but we overlook the complexity and quality involved in keeping something running smoothly. But this reliability and performance will also add up over time for a more seamless user experience. Without being given any technical reason to churn, users will judge the game based on creativity and enjoyment. Observability makes it possible for everyone to focus on what actually matters: the game itself.
Principal Consultant | Capacitas | Cloud Cost Optimisation | FInOps | AWS | Azure | GCP | OCI | Helping organizations to reduce their cloud spend, with a focus on performance
9 个月Dr. Manzoor Mohammed Danny Quilton Gerald Frank Mercieca Thought you might find this article interesting. Let me know your thoughts.
Principal Consultant | Capacitas | Cloud Cost Optimisation | FInOps | AWS | Azure | GCP | OCI | Helping organizations to reduce their cloud spend, with a focus on performance
9 个月Observability is good but what is more important, is having observability that is set up correctly so you can pin point issues as quickly as possible. It will also allow you to detect problems earlier, before they become service impacting.