The fundamentals of Delivery Infrastructure

From the perspective of a release engineer in games production

This article is for you who would like to get into delivery pipelines in software and assets, and have just started to scratch the surface.

You might also be here because you want to know what I mean by “Delivery Infrastructure”, since it isn’t actually an official term, but I feel like it should be.


If you are new to CI/CD and DevOps, then you are like me back in 2007. At that time, CI/CD wasn't mainstream, and DevOps was in its infancy.

DevOps is often described as a framework or culture focused on fostering collaboration between development, security, and operations teams. In that sense a DevOps Engineer is in a way a Collaboration Engineer, which is pretty neat to think about.

CI/CD, a subset of DevOps, stands for Continuous Integration and Continuous Delivery. It is a methodology that seeks to automate code integrations, builds, and deployments, emphasizing the "continuous" aspect to streamline development workflows.

While researching for this article, I found it difficult to pin down the terminology to use, for the underlying structure of a delivery pipeline in software and games. The closest descriptors I could find were “Continuous Delivery Infrastructure” and “Service Delivery Infrastructure” but I decided to stick with just “Delivery Infrastructure”. It makes it more broadly applicable, and has fewer words which is nice.

Delivery Infrastructure is essentially what you need to enable collaboration, maintain consistency, and allow transparency and insight.

Think of it as a three-layered foundation: Versioning, Automation and Tracking.

By identifying and integrating the tools that support these layers, you can create a robust delivery infrastructure for your production.

Versioning

Versioning is at the heart of any delivery infrastructure. Even if you aren’t automating anything, versioning still enables collaboration, the cornerstone of DevOps.

In simple terms, versioning means maintaining a history of changes to the codebase and assets, while allowing you to navigate this history. With a Version Control System (VCS), you can branch out from a given point in time, merge changes and most importantly enable multiple contributors to work on the same product, even the same code files, simultaneously.

Probably the most well known VCS for this is Git, very commonly used with services such as Github, Bitbucket, or Gitlab.

Another well known tool for versioning, in particular in game productions, is Perforce. While the differences between Git and Perforce are for another article, I will point out here that the latter is known in the games industry for being the better choice. One of the reasons being its ability to handle very large depots of very large files, out of the box. Large quantities of massive files is something that is well known for studios that do mid to large scale productions, or AAA.

Git however is known for being easier to work with, especially if you mostly just have code to worry about. There are a lot more versioning tools out there, e.g. Plastic and SVN, but Git and Perforce are the most popular in the games industry. All of them however solve the task of versioning well enough to enable collaboration, and automation.

Automation

When it comes to CI/CD, Automation is the defining factor, and in terms of Delivery Infrastructure it is the life of the party.

The list of automation environments is as long as the list of opinions on the matter, but the top three that I’m familiar with from game productions are TeamCity, Jenkins and GitLab. Some productions even roll their own.

No matter what tool you choose or how you implement it, a well functioning automation solution is able to react to various triggers, such as updates in your version control history. It can pull the relevant versioned files to some designated computer, also known as a build agent, and run built-in logic or scripts you wrote yourself, on those versioned files. It can provide you with some kind of output such as test results, build artifacts, or even updates back into version control.

Most automation environments come with a web-UI for configuration and reporting, which allows you to configure even complex pipelines fairly easily.

I would like to give you one piece of advice in this regard, pick an environment that allows you to write the actual configurations as code, not just the jobs they run. This will allow you to version your configurations with your product code, run unit tests on them, and makes it easier to collaborate on the automation.

For example, TeamCity by JetBrains allows you to create configurations in its web-UI and convert them to Kotlin, providing a helpful bridge if you are new to coding configurations. Regardless of the tool, adopting Infrastructure as Code (IaC) early on is highly recommended.

With automation in place you get to focus more on developing your product, and you are able to implement a standard for your code and assets, verified in your automation environment. Immediate failures can be dealt with via notifications like emails or some other method, but if you want the ability to analyze historical outcomes, measure improvements and in general just learn from the data, you need tracking.

Tracking

Most automation environments come with the ability to do real time monitoring, and often the ability to notify you in some way. The aim with this is to be reactive when e.g. a test fails or your build pipeline breaks. This is important, but does not help you learn from trends.

Data tracking is the act of selecting and storing metrics and data points, and analyzing them. While tools like TeamCity do store data, potentially a lot of data, but isn’t really designed for tracking. They are meant to give you enough history to run a smooth automation environment with some level of reporting, but not designed for in-depth historical analysis.

If your production is small, you can likely get away with tweaking the capabilities of the automation environment, and use the built-in reporting and visualization tools. These can often be a great way to get started, and might be all you need if your production stays small.

Beyond this you’ll need to look for other tools that are capable of ingesting large quantities of data, potentially from various sources, and model it.

At the risk of sounding like a broken record, there are many tools out there that do this, a few of them are Power Bi, Kibana and Grafana.

Roughly speaking there are two sides to these tools, the datastore, and the visualization. The datastore is often not actually a part of the visualization tool, but sometimes, like in the case of Kibana using Elasticsearch, the tool is designed to work with a specific datastore. Grafana was originally a fork of Kibana, and works with many datastores, as well as being open source.

They are all built for slightly different purposes however, and comparing them is a topic so big many articles have already been written about that, by people much more knowledgeable than I.

No matter if you use one of these tools, or are content with creating pie charts in a spreadsheet, the important takeaway here is that you gain the ability to understand the data you generate, and make it tangible, so you can make decisions that improve your production.

The glue

The last bit of the equation is the interconnectedness of these foundation layers, and the systems they depend on. This is perhaps also the biggest reason you’d want a dedicated person, or team even, in your production to maintain these layers. Each tool you invest in for your delivery infrastructure, needs maintenance, especially if you host them locally. And while updating a server can be its own headache sometimes, the real challenge lies in the connective tissue between the services.

If you stick to known tools, you stand a good chance that they support each other out-of-the-box. But even then you are likely going to need some level of configuration to make them connect efficiently. If you have a larger setup you might even have other services in between, e.g. a VCS file cache for your build farm, or a dedicated datastore for all the tracking, which means you now have more connections to maintain.

Then there is the network and authentication infrastructure your delivery pipelines likely depend on. A build engineer in a larger production typically doesn't have direct access to the IT teams platforms. This means that updates to e.g. a firewall could bring down the automation, or break access to a datastore, or some third thing. The connection that needs maintenance here is human in nature first and foremost, as it requires a relationship with those whose systems you depend on.

Besides all the maintenance stuff, you also want your pipeline to do useful stuff, like create builds, deploy artifacts, run tests, compile reports and so on. These efforts are often a collaboration between departments and probably also why “DevOps engineer” is an actual job title. The scripts and other logic you might have running in your automation environment will to some extent also depend on the layers of the delivery infrastructure, and will benefit from not only maintenance, but coding standards during development.

It is a good idea to have a dedicated team that owns the quality of these routines, and perhaps it isn’t a surprise that the real glue to this infrastructure is in fact the people that maintain it.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了