Operational Readiness - Platform Engineer's Perspective.
Pratik Daga
Principal Engineer | Ex Tech Lead-Asana & Staff Engineer-LinkedIn | Multi Family Real Estate
Operational Readiness can be defined as the capability to efficiently deploy, operate, and maintain the systems and procedures. The main purpose of operational readiness is to reduce operational risks during changes. An Operational Readiness Assessment ensures the system/platform is prepared to effectively support and accept the changes resulting from the project. The assessment helps determine the readiness state of the system and defines how close this system/platform is to the desired state. The life cycle of the project can be divided into four phases:
The key action items in each phase:
Phase 1: Identify key areas of change and their potential impact
Phase 2: Define operational needs, assess readiness and prepare test plans
Phase 3: Test for readiness, make adjustments wherever required, setup ramping strategy
Phase 4: Share knowledge for continuous improvement and transfer responsibilities to on-call
Project Initiation Phase:
The goal is to define your project at a high level and tie it into the business case you wish to solve. You should be able to answer two questions: why are you doing this project and what is the business value you expect to deliver? Consider the feasibility of your project and all of the stakeholders that may be affected or require involvement.
Things to consider:
- Evaluate the impact on upstream and downstream systems
- Identify new dependencies
- Inform stakeholders
Requirement/Design Phase:
Requirement Analysis Phase is to transform the needs and high-level requirements specified in earlier phases into measurable, testable, traceable, complete, consistent, and stakeholder-approved requirements.
Things to consider:
- Define SLAs, provisioning, scaling and monitoring considerations
- Involve stakeholders
- Evaluate integration points
- Prepare for testing strategy
Implementation/Test Phase:
Implementation phase is when the system is actually built based on the plan created during phase two. Another key task of this phase is to test the system thoroughly and perform changes and improvements as needed. The changes to the system should be sustainable in every aspect. Setting up runbooks and documentation is critical to the smooth long term maintenance of the project.
Things to consider
- Test the critical scenarios
- Setup monitoring, ramping strategy and alerts on critical metrics
- Setup runbooks and documentation
Post Implementation:
Document best practices and procedures that led to project success and make recommendations for applying them to similar future projects. Handover runbooks, documentation and key metric monitoring to on-call for seamless support.
Things to consider
- Handover support to on-call
- Knowledge transfer
Technology Leader
4 年Great article, Pratik!? ? Are these processes something you have already "operationalized" ??on your team?? ?BTW, I especially like Phase 4 of your framework: Share knowledge for continuous improvement and transfer responsibilities to on-call.? ?I think sharing our learnings and thoughts is always key, whether it's within the team or with an even broader audience through...say, a LinkedIn post.? ??