What is SRE?
https://www.dragonspears.com/blog/the-sre-model-and-its-business-implications

What is SRE?

In the traditional system, Sysadmin (Systems Administrator) fixes the broken systems and keep working on incidents/events to make system reliable. But still most project teams failed to achieve the desired SLA. And Hence, most of the companies are moving towards different model and approach that is accompanying with emerging technologies and support model. New approach has less conflicts between teams and build a new system to achieve maximum reliability and durability.

So, what is new approach? #SRE - A systematic and automated approach to enhancing IT service delivery using standardized tools and practices.

Benjamin Treynor Sloss (VP engineering at Google Cloud) explained SRE as - “SRE is what happens when you ask a software engineer to design an operations function.”?SRE is where software developer team develop software systems to solve complex systems’ problems i.e. Capacity and performance planning, disaster management and quality monitoring.

SRE team is responsible for the?availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning?of their service(s).?For SRE team, a 50% cap on the aggregate "ops" work - tickets, on-call, manual tasks, etc. You will get more time for serious coding if you have reduced ops work by your coding.?

SRE Team Responsibilities –

  • Build software Engineering - Build exclusive tools to mitigate risks, manage incidents and provide services like production, code change, alerting and monitoring.
  • On-Call Process Optimization?- Implement run-book tools and other automation techniques to ready incident response teams, enhance their collaborative responses in real-time, and appraise documents.
  • Fixing Support Escalation - Work in collaboration with relevant teams to remediate issues.
  • Documenting Knowledge - Documenting information is crucial to ensure a smooth flow of operations among teams.

Tools Used by SRE Team –

Selecting right tool is very important for managing the challenging environment of client. There are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more.?

No alt text provided for this image

Some Important Definitions –

1)?????SLO – Service level objective –

·???????As per Gartner - SLOs are the objectives that must be achieved — for each service activity, function and process — to provide the best opportunity for service recipient success?

·????????As per Wikipedia - SLOs are specific measurable characteristics of the SLA such as availability, throughput, frequency, response time, or quality. These SLOs together are meant to define the expected service between the provider and the customer and vary depending on the service's urgency, resources, and budget. SLOs provide a quantitative means to define the level of service a customer can expect from a provider.

2)?????Error Budget – ??An?error budget?is the amount of?error?that your service can accumulate over a certain period before your users start being unhappy. You can think of it as the pain tolerance for your users but applied to a certain dimension of your service: availability, latency, and so forth. Error budgets are the tool SRE uses to balance service reliability with the pace of innovation. An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget. If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period.

Conclusion –

SRE establishes a healthy and productive relationship between development and operations. SRE is an enabler to maintain the massive infrastructure in an intelligent, efficient, and scalable way.?

Reference –

https://www.blameless.com/blog/choosing-sre-tools

https://medium.com/memory-leak/introducing-redpoints-sre-landscape-b9c363708f26

https://sre.google/sre-book/table-of-contents/

要查看或添加评论,请登录

Varun Kaushik的更多文章

  • Different Phases of AI (Artificial Intelligence)

    Different Phases of AI (Artificial Intelligence)

    Artificial Intelligence (AI) is no more a buzz word and it dominates all the conversations nowadays. AI has transformed…

    2 条评论
  • KPIs Improved by leveraging AI and GenAI

    KPIs Improved by leveraging AI and GenAI

    Leveraging AI and Generative AI (#GenAI) can significantly improve various KPIs, Like - Mean Time to Repair (#MTTR)…

    1 条评论
  • Cognitive Infrastructure: Backbone of Generative AI

    Cognitive Infrastructure: Backbone of Generative AI

    Cognitive Infrastructure which is the backbone of #GenerativeAI. To achieve the full potential of #GenerativeAI, we…

  • Architecture Framework

    Architecture Framework

    Each Cloud provider suggests the best design principles which are the set of pillars to build solutions by using the…

    2 条评论
  • Emerging Commercial Models for IT Contract

    Emerging Commercial Models for IT Contract

    Every time when you read about any contract signed by our IT companies, you must be very interested to know the total…

    2 条评论
  • Mirror, Mirror.. Tell me truth

    Mirror, Mirror.. Tell me truth

    Mirror, mirror on the wall, who’s the fairest of them all? Very familiar lines..

  • Edge-as-a-Service (EaaS)

    Edge-as-a-Service (EaaS)

    Introduction – Before understanding what is “EaaS - Edge as a Service”, we must understand what “Edge Computing” is…

    9 条评论
  • Self-motivated members are motivated if..

    Self-motivated members are motivated if..

    Motivated members are very critical for the success of your project. “Positive relationship” between member and manager…

  • AWS Migration Framework

    AWS Migration Framework

    Migration Framework includes tools and processes to find the decision points and conversational directions during…

  • Infrastructure as Code

    Infrastructure as Code

    What is Infrastructure as code (IaC) ? Infrastructure as Code (IaC) is a method or an approach to manage the data…

    1 条评论

社区洞察

其他会员也浏览了