Six Elements for Effective SRE Adoption
Khwaja Shaik
IBM CTO ? Digitally-savvy and Cyber-savvy Board Director ? CEO Advisor ? Competent Boards Faculty ? Making Purpose Real Through Board Excellence ? Global Perspective, Digital Transformation, AI, Cybersecurity, ESG Expert
Every business is a technology business. It is imperative to drive business value so that customers love our products and services. Organizations often struggle to establish robust technology operating model that is fit for the future.
Business models constantly shift, and user demands increase exponentially. Ensuring business resiliency is paramount as you grow your business. Site Reliability Engineering (SRE) practices are one of the core tenets of business resiliency.
SRE brings software engineering focus to infrastructure and operations through automation and deliberate practice to create ultra-scalable and highly reliable business software systems. -Khwaja Shaik, IBM Thought Leader
Here is my blueprint for effective SRE adoption:
1. Balance change velocity with reliability.
- If you are not doing DevOps today, then you do not need SRE. Use SRE as a strategic tool to manage technical debt with a "recovery from failure" mindset. Apply "shift-left" practices to build manageable applications.
- Take an iterative approach, split long efforts into smaller efforts and increase the change velocity with extreme programming practices.
- Apply design thinking to build and operate products with reliability. Improve service levels by applying software engineering concepts with automation, monitoring, and alerting services.
User experience, automation, and error budgets are like a three-leg stool to meet the business objectives of SRE. -Khwaja Shaik, IBM Thought Leader
- Traditional performance testing is not enough. Adopt fault tolerance testing, resource optimization testing, client-side performance testing, and continuous performance testing practices.
If you are a high-velocity custom software development shop, then SRE is the foundation to drive customer value and innovations faster. -Khwaja Shaik, IBM Thought Leader
- Identify single points of failure and infuse fault injection testing to uncover vulnerabilities as part of chaos engineering practices.
- Reliability does not just apply to performance and resilience testing only. Reliability is needed across the entire SDLC.
2. Focus on business-centric objectives, shared objectives, and reducing operational risks.
- The old model of IT service management is over. Balance delivery velocity with reliability that centers on user experience and customer value. Shift focus from risk aversion to risk management.
- Co-create objectives by engaging the operations team and development team. This will build cohesion and teamwork for meeting business outcomes.
- Don't stop at automation. Your ultimate goal is to eliminate the problem and reduce technical debt by applying software engineering practices and deliberate practices.
- Invest in flexible infrastructure that enables scaling. Apply software engineering principles with a focus on resilience.
- Measure user experience with better feedback loops and faster iteration.
As governance gets decentralized, SRE focuses on recovery from failure. It prevents issues from recurring. -Khwaja Shaik, IBM Thought Leader
3. Instill collaborative and customer-centric culture.
- Break the silos of organizations by infusing fusion teams comprising of development and operations teams.
- Inject product teams into operations by emphasizing collaborative, transparency, risk-recptive, integrated, and automated approaches to support cloud-native architectures.
- Don't blame individuals for technical problems. Instead, focus on the pursuit of root cause analysis to prevent problem recurrence.
- Without trust, your SRE practice is bound to fail. Do not tolerate the finger-pointing environment. A collaborative culture is paramount to SRE's success.
DevOps and SRE roles complement each other. Neither replaces the other. Leverage ChatOps tools like chatbots to automate day-to-day tasks. -Khwaja Shaik, IBM Thought Leader
- Enable experiment-based learning, instill autonomy, and reward for postmortem outcomes and for embracing risks. Treat failures as learning opportunities.
- SREs must be equipped with skills to read code and resolve problems if necessary.
The more SREs collaborate on re-architecture, re-design, and refactoring decisions, the more resilient your systems become. -Khwaja Shaik, IBM Thought Leader
- Modernize change management by embracing shift-left, quality due diligence, error budgets, and fail-forward practices.
4. Form an interdisciplinary SRE team and hybrid support model.
- You need a combination of business analysts, software engineers, systems engineers, and vendor subject matter experts. Collaborative skills are as important as automation skills. Without teamwork, the SRE will not succeed.
- Align SRE support team for each product/business area. This is vital as you move from project-centric culture to product-centric culture. Adopt RACI models as you continue your modernization journey.
Leverage the best of process-driven ITSM and agile driven DevOps processes as you transform to a cloud service management operations model.
- Modern architectures such as microservices, containers, serverless, and service mesh increase architecture complexity. It is important the SRE team has a systems thinking mindset and a complete understanding of the application and its dependencies.
Don't overlook performance and reliability goals as you increase development velocity. Customer obsession and risk management are two sides of the same coin.
- Adopt a blended support model based on the nature of the application landscape. Your monolithic applications may still need a tiered support model as these monolithic applications have a low velocity of change. If your development velocity is high with weekly or daily releases, then you need to instill SRE practices so that your fusion teams can understand micro services-based mesh architecture and the associated dependencies.
- Rotate SREs between the development team and operations team.
5. Adopt SRE holistically.
- The SRE function must focus on these seven focus areas-DevOps, Automation, Service Level Indicators/Objectives, Performance engineering, Change Management, Incident Management, Monitoring, and observability.
- Train your operations team with coding skills. Ensure feedback loops so that the bugs can be correlated and product enhancements can be made based on the root cause analysis.
SRE success starts with people. Recognize and infuse SRE evangelists to contribute learnings and experiences to the high resiliency engineering community. -Khwaja Shaik, IBM Thought Leader
- The most common skills for SRE success profile include Linux, software development, and Python. The fastest-growing skills include Terraform, Kubernetes, and Microsoft Azure.
- Validate signals such as latency, traffic, errors, and saturation values every quarter. This is the era of commodity hardware, cloud-native architectures, and horizontal scaling to support graceful failures.
- Include the entire infrastructure landscape, including dedicated, virtual machines, multi-cloud containers, and SaaS applications for the monitoring.
Observability and AIOps are fundamental to reduce operations team's workload and improve incident management metrics. This is crucial to identify the cause of anomaly or malfunction. -Khwaja Shaik, IBM Thought Leader
- Provide automation tools to perform various tasks easier as you move from reactive model to proactive model. These include infrastructure provisioning, incident response, monitoring, and performance engineering.
No single tool will address all monitoring use cases. Enforce monitoring as a platform operating model. -Khwaja Shaik, IBM Thought Leader
6. Accelerate SRE Skills through learning culture.
- Infuse growth mindset to enable a wide variety of skills -application architecture, infrastructure architecture, network architecture, security architecture, minor coding automation, troubleshooting etc.
- Invest in SRE skills to shift operations from passive ticket-taking tasks to a continuous improvement mindset.
- Open and collaborative skills are key differentiators
Don't adopt the big bang approach. Instead, start small with one application support and gradually increase the SRE team with each successful rollout. -Khwaja Shaik, IBM Thought Leader
- Institutionalize SRE communities of practice that cater to Full-service SREs, Application SREs, Platform SREs, Incident Management SREs, AIOps, and chaos engineering.
Conclusion
Embrace change and think holistically as you start your SRE journey. SRE is more of a mindset change from your old ways of ensuring business resiliency through ITSM. Treat this as an opportunity to upskill and reskill your operations team as part of a hybrid cloud operating model.
Question
What strategic actions are you taking to achieve business resiliency? Have you assessed the maturity model of your service management operations? Where are you in establishing cohesion between DevOps and SRE teams?
Please share your thoughts in the comments section below.
For professional insights into complex issues, join the conversation by tweeting Khwaja at @Khwaja_Shaik or connecting with him on LinkedIn.
ABOUT KHWAJA SHAIK
Khwaja Shaik is the award-winning global IT Executive with 25+ years of business technology leadership with IBM, Bank of America, PwC, and GE. He has a worldwide reputation and a proven track record in driving digital transformation and the newest innovations.
As IBM’s Thought Leader, Khwaja’s role is to help clients stay ahead of the digital disruption curve by leveraging Design Thinking, Cloud, IoT, Blockchain, Artificial Intelligence, Cybersecurity, and Quantum Computing. Khwaja is among the most exceptional IBMers appointed with the rare distinction of IBM Academy of Technology member. Top 100 technical leaders providing the direction of IBM with innovation that matters.
As a strong proponent of talent development, Khwaja serves as IBM’s Design Thinking Coach for IBM’s Developer Jumpstart Program, IBM’s BlueHack Mentor driving innovation, and IBM’s Blockchain Mentor to spur the blockchain ecosystem.
Khwaja also serves as McKinsey Global Institute’s Executive Panel Member, MIT Sloan CIO Forum Member, Gartner’s Research Circle Member, MarketsANDMarkets Advisor, and HBR’s Advisory Council Member driving global thought leadership.
As a global influencer, Khwaja frequently blogs on exponential technologies at IBM, LinkedIn, and Twitter. With his passion for interfaith and nurturing global talent in STEM, he serves on the Advisory Boards of Interfaith Center of Northeast Florida and Museum of Science & History, and the University of North Florida’s Computing Advisory Board.
Recipient of outstanding service awards from the University of North Florida, Bank of America, IBM, and Indo US Chamber of Commerce of Northeast Florida. He is frequently interviewed for industry insights or cited in the news, Thought Leadership POVs, and blogs on disruptive technologies.
Khwaja holds an MBA and Engineering degree. He is a frequent speaker on exponential technologies at various forums, including the CIO IT & Security Forum, MHI Supply Chain Conference, IIT Hyderabad, and Indo US Chamber of Commerce of Northeast Florida.
Client Solutions Sales Director @ Wipro | MBA
3 年Great article from one of our brilliant minds