Accelerating cloud journeys with 24x7 Cloud Reliability Engineers
Martijn van Dongen
Cloud Evangelist at Schuberg Philis | AWS Hero | Eternal AWS Community Leader
Managed Services Providers did a great job in managing virtual datacenters, VMs and Infrastructure as a Service. While cloud is providing more and more so called “cloud native services”, the traditional SLA, pricing model and agility of Managed Services becomes more and more a problem. Together with partners and customers, we are working on an alternative approach. Although it’s still under development, it certainly makes sense to start thinking differently. This article addresses some of the problems and provides insight into the near future.
Legacy VMs
Lots of customers are still using VMs. Disruptors, accelerators and innovators consider VMs to be “legacy”, like on-premise hardware was a few years ago. VMs are just there to host particular “legacy software” and provide lift & shift migrations to cloud. Lots of new services were introduced over the last months, that run workloads more efficient, more reliable and faster, without having to manage any server. Even for running containers there are native cloud services available now, and it’s no longer necessary to build and maintain your own orchestration. Recently a blog post was published about Hybris in Containers. These kind of applications called ‘monoliths’, used to be extremely difficult to run in containers. This post proves even vendor applications are making this move to containers.
Fixed Fees
A fixed fee every month for a managed services is either overpriced, so profit for the provider, or it’s underpriced, which will result in lack of improvement. It’s impossible to predict how much effort you have to spend on rising technologies. For example serverless and containers are technologies which are very hot at the moment, but it will have high impact on your architecture. Just spending some hours of R&D won't be sufficient to embrace these innovations and keep ahead of competition.
Stability
Managed Services Providers have a high pressure on stability. The only effort they would put into the product development is to ensure the software won’t break anything written the SLA. Things like product improvements, cycle time reductions and user experience improvements, are just contributed to with a ‘reasonable effort’, or it’s not even on the agenda.
Agility
There are several ways to compete. It’s either a low price with poor quality, or high price with premium quality. High quality doesn’t mean it’s also flexible. With rising agile and high performance organisations, requirements change from time to time, it depends on the maturity of the product or service. An MVP just launched has different needs and KPIs, than an iteratively developed and mature product. It depends on the feedback of the end users. Lowering errors, improving performance, reducing waste and introduce new features quickly, should be prioritised. Not once upfront in an SLA, but continuously and in real time.
Talent
Managed services providers and their customers seem to be happy with a 6 out of 10. It’s considered “good is good enough”. That’s not true. Competitors of your IT partner as well as your own competition, put way more effort on innovation. It helps reducing costs, but also gives you access to talent. Do not forget there is a high demand on IT specialists these days. It means they become more expensive, demanding, or worst case even ‘sold out’.
Do it yourself
You could invest in knowledge, train your team(s) and hire experts. It probably cost you more or less the same as outsourcing, but you will introduce some risks. It’s hard to find or raise good engineers these days. Training them will take a high lead time, for sure we know how much we had to do to become ‘cloud guru’. And with this run on highly skilled cloud engineers, there’s the risk of losing them too, like I already mentioned in the previous paragraph. There are many pitfalls when going to the cloud without proper investments in knowledge, or without a good cloud strategy. One of the major pitfalls is to transform a whole organisation at once. The project becomes too big to succeed. Particular platforms and teams cannot wait for the whole organisation, and should accelerate on its own.
Alternative approach
Like promised in the intro; what could be the alternative approach? It certainly is not outsourcing, or hiring contractors for a long time, it rather is a consolidation of both extended with modern reliability practices. Because of all the fine grained cloud technologies, you’re not only developing software, but the infrastructure as well. And according to the saying: “you build it, you run it”, the product team should have the knowledge and end-to-end responsibility. In order to create such a team, find a partner who:
- can re-architect or design the platform including cloud native services and fully benefit from all the cloud potential. (See AWS Well Architected Framework and ThoughWorks Technology Rader 2016)
- provides ‘T’ shaped cloud specialists, where the vertical bar stands for a particular provider like AWS, Azure or GCP.
- embraces practices like Customer Reliability Engineering and an earlier published book by Google: Site Reliability Engineering
- defines general Service Level Objectives (SLO). No penalties or blaming when SLO’s are not met, but implement improvements together as one team.
- let you pass a strict Production Readiness Review, which then ensures Cloud Reliability Engineers are on-call 24x7 in case of emergencies.
- provides knowledge sharing and guide your own employees to become Cloud Reliability Engineers themselves.
Head of Enterprise Greenfield Europe North at Amazon Web Services
8 年Goed verhaal Martijn!