Be like one of the FAANG+
Bild von Ronald Carre?o

Be like one of the FAANG+

In the past, I was repeatedly confronted with the following statement in the course of software reliability and maintenance of software applications: "We don't need that - we are not Google (or one of the big other companies such as Microsoft, Alphabet,… – for simplicity we put all the well-known companies under the umbrella term FAANG+)".

I know that anyone is usually faced with several influence factors such as cost pressure or kind of feature fury when developing software applications. But, we have the responsibility to prepare for the unexpected – for possible incidents (that can and will occur to a certain degree):

"Anything that can go wrong will go wrong" – Murphy's Law.

It may be that you are not one of the FAANG+ – however, in my opinion, the underlying principles/patterns/strategies and also the learnings from building and operating large systems can (and should) be applied to all companies in the digital environment. You become aware of this when the unexpected happens… but it is much harder to conquer this situation with no strategies in place.

The only question one should ask oneself: To what degree should I adopt the principles/patterns/strategies to achieve my target goal? And exactly that, is the crucial point. The target goals represent characteristics on which our users of the system can rely on. It has to be measurable so that we can track our progress in reaching it and get our current state. One target goal could be the availability of 99.999 – on some other system the 99.9 is fine (it really depends on the criticality of the system for the whole business).

When abstracting the idea of target goals, we can draw comparisons with Objective Key Results (OKR). There is a more appropriate term for our target goals called SLOs (Service Level Objectives) from the field of site reliability. Put it simply, one can say that the SLOs – our target goals – represent the OKRs of modern software systems. To be complete, you will find some important terms from the area of Site Reliability Engineering (SRE) below:

  1. Service Level Indicators (SLIs) represent a metrics that deliver insights of your running software system.
  2. Service Level Objectives (SLOs) are based on SLIs and represent your target goal that you want to achieve with your software system.
  3. Service Level Agreements (SLAs) are contracts between a service provider and one or more service consumers. Usually, they include a collection of SLOs to outline what service consumers can expect from you.
  4. Error Budgets represent an acceptable range of your current SLIs and the defined SLOs.

Be like one of the FAANG+ and incorporate their principles/patterns/strategies into your ones and do not simply say: “We are not Google”. This is too easy right? So start with your first SLOs, today. You are not sure what to start with?

You could perhaps say that the error rate should not increase with a new release of an application or that the availability should be at least at 99.9. Then, you need to introduce the SLIs that provide you the needed insights since you need to measure it. Is that all? Possibly not... A cultural change is certainly necessary to integrate the principles into the everyday life of an engineering/product team.

Stay tuned to learn more about SLOs and how engineering teams can incorporate them in their daily work. In the meantime never stop learning and keep up with the best to succeed in your digital space.

Abdelfettah Latrache

software engineer at SYNDIKAT7

2 年

Very interesting insights, thank you for sharing! looking forward to reading more about Site Reliability Engineering.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了