Be like one of the FAANG+
Dr. Pascal Giessler
Chief Technical Officer bei SYNDIKAT7 | Software Engineering
In the past, I was repeatedly confronted with the following statement in the course of software reliability and maintenance of software applications: "We don't need that - we are not Google (or one of the big other companies such as Microsoft, Alphabet,… – for simplicity we put all the well-known companies under the umbrella term FAANG+)".
I know that anyone is usually faced with several influence factors such as cost pressure or kind of feature fury when developing software applications. But, we have the responsibility to prepare for the unexpected – for possible incidents (that can and will occur to a certain degree):
"Anything that can go wrong will go wrong" – Murphy's Law.
It may be that you are not one of the FAANG+ – however, in my opinion, the underlying principles/patterns/strategies and also the learnings from building and operating large systems can (and should) be applied to all companies in the digital environment. You become aware of this when the unexpected happens… but it is much harder to conquer this situation with no strategies in place.
The only question one should ask oneself: To what degree should I adopt the principles/patterns/strategies to achieve my target goal? And exactly that, is the crucial point. The target goals represent characteristics on which our users of the system can rely on. It has to be measurable so that we can track our progress in reaching it and get our current state. One target goal could be the availability of 99.999 – on some other system the 99.9 is fine (it really depends on the criticality of the system for the whole business).
领英推荐
When abstracting the idea of target goals, we can draw comparisons with Objective Key Results (OKR). There is a more appropriate term for our target goals called SLOs (Service Level Objectives) from the field of site reliability. Put it simply, one can say that the SLOs – our target goals – represent the OKRs of modern software systems. To be complete, you will find some important terms from the area of Site Reliability Engineering (SRE) below:
Be like one of the FAANG+ and incorporate their principles/patterns/strategies into your ones and do not simply say: “We are not Google”. This is too easy right? So start with your first SLOs, today. You are not sure what to start with?
You could perhaps say that the error rate should not increase with a new release of an application or that the availability should be at least at 99.9. Then, you need to introduce the SLIs that provide you the needed insights since you need to measure it. Is that all? Possibly not... A cultural change is certainly necessary to integrate the principles into the everyday life of an engineering/product team.
Stay tuned to learn more about SLOs and how engineering teams can incorporate them in their daily work. In the meantime never stop learning and keep up with the best to succeed in your digital space.
software engineer at SYNDIKAT7
2 年Very interesting insights, thank you for sharing! looking forward to reading more about Site Reliability Engineering.