Mythbusters: SRE Edition

Mythbusters: SRE Edition

Let’s debunk the most popular Site Reliability Engineering (SRE) myths.

Companies should be prioritizing SRE transformation in their operations as it allows teams to make improvements based on issues surfaced.?However, there are numerous misconceptions, so let’s debunk some of the more common ones

SRE is the new & improved version of DevOps

Organizations should not look at SRE as the new and improved version of DevOps. These should be viewed as two sides of a coin as both concepts complement one another.

SRE is about injecting software engineering practices, and a new mindset, into IT operations to create highly reliable and highly scalable systems.?DevOps was conceptualized to help dev and IT operation teams collaborate more and shift toward an agile methodology. While the dev team aims to release as many new features as possible, the IT operations team’s goal would be to manage and prioritize as many releases as they can to maintain the system’s stability. Often, conflicting priorities hamper progress and stability, leading both teams to work in silos. This is where SRE comes in to improve the situation and drive DevOps’ success.

SREs work to ensure 100% uptime

Customer experience is the goal. Determine good enough vs. too broken, not 100% uptime.

SRE is not about reaching a goal of having zero outages but achieving a sustainable and appropriate level of availability and velocity when it comes to releasing new features.?While uptime is important, the customer experience needs to be understood and constantly monitored. Error budget can therefore be used to understand and continuously improve customer experience.

Experimentation and innovation are utopian & unreachable ideals

You will be chaos engineering in production sooner than you think.?Just need to establish trust first.

It took over a year to establish trust with the application teams and with leadership, but we did perform chaos engineering in production on systems generating tens of millions of dollars.?With good CI/CD pipelines and automation, you can always do a quick rollback when something goes wrong ??

An organization with strong experimentation and innovation culture tends to have a better chance of succeeding when implementing SRE.?Start small and work your way up.?Chaos engineering has come a long way over the last 5 years and there are plenty of tools now to get you started, safely.

SRE only treats software problems

Yeah right! SREs are Swiss Army knives.

The issues we worked on were vast.?Every challenge was unique and that is the beauty of a strong diverse SRE team.?The team handles everything from the hardware all the way to the edge and beyond.?We’ve dealt with DNS issues, networking issues, scaling issues, load balancing issues, even cutting cloud costs.

Observability & Monitoring is only achievable on cloud-native applications

Contrary to popular belief, a lot of enterprises are still running things outside of the cloud and SRE can also be applied to legacy apps.

Regardless of where the app lives, teams can start by looking at the app from the perspective of identifying and measuring essential metrics to view what happens inside the system.?You can use commercial solutions, open-source or home-grown tools to capture data that is useful for them.?You want to uncover weaknesses within the legacy app which will help find the root cause and ultimately solve the problem.

Full-stack developers and SREs do the same job

The developer’s job is to build.?The SRE’s job is to engage the developer and guide them around architecture, implementation and drive the availability and agility of the app at a sustainable pace.

Think of the SRE as a consultant in this relationship.

A team working together is more sustainable and truer to life, especially when dealing with a large, complex, distributed system

Everyone can become an SRE

It is more than a title.

You must have a natural inclination towards approaching problems from a generalist system thinking.?SREs work like detectives who are always seeking improvements and solving problems.?They possess software development skills, with the ability to work using automation tools, and have experience as a sysadmin or in an IT operation role.

?

Does this help? What did I miss? What other myths are out there that we need to debunk?

要查看或添加评论,请登录

Dale Frohman的更多文章

社区洞察

其他会员也浏览了