Self Healing Automation
Vishal Patel
Engineering leader | DevOps | DevSecOps | Platform Engineering | Product Management | Product Delivery | Automation
Automation has completely changed the landscape of application development, be it thru DevOps or Test automation. Automation reduces the extent of human errors & it is efficient. Almost every phase of software development & delivery process is automated. Every activity from building & deploying the code, unit testing, functional testing, integration testing, performance testing or acceptance testing is all automated and this has significantly reduced the time to market.
Automation is not a one time activity, don't expect the same automation to work always. With ever changing business requirements, architectural changes of the software application, the automation solution needs to be upgraded & maintained. IT companies these days are investing heavily on having a separate teams for developing & maintaining automation solutions for their IT needs. Automation solutions should follow the development life cycle approach at par with any software application development process.
Smart automation solution is the key for the success for the timely & quality software development. Following are the characteristics of a smart automation solution:
- Functionality
- Usability (ease of use)
- Efficiency
- Reliability
- Repeatability
- Maintainability
- Portability
- Secure
3 S's of Automation: Following are the 3 pillars which guides the design & development of automation solution for your application. Automation solution should be Self-Service based, should be capable of Self-Diagnosing and should be Self-Healing.
Self Service Automation
A self-service automation system is the one which gets triggered by itself, upon a specific incident, without any manual intervention. Self-service automation solution is a flow-based solution, which keeps running on itself based on a specific flow criteria. You define different states in the flow and a specific criteria for the automation to move from one to another upon meeting a specific criteria.
A very simple example to explain this, is to trigger the code build process after code commit. Another example could be to stop the CI pipeline whenever the build breaks. One another example would be to run environment verification test cases whenever you build out a new test / production environment to verify whether it matches a particular criteria. In these examples as it is evident, you don't need human involvement to trigger to automation.
Self Diagnosing Automation
An automation system should have a monitoring & discovery mechanism which continuously monitors the system to ensure that it is working normally. A Self-diagnosing system is expected to reports any deviation from its expected behavior. In order for a system to be self-diagnosing, it should be self-aware (or more concretely) always have a deterministic state, should be able to recognize the fact that an error has occurred & should have enough knowledge to stabilize itself.
It has the knowledge of metrics of ideal/permissible range of the IT system which it is monitoring, it includes server monitoring, network monitoring, database monitoring, log monitoring and application performance monitoring. It should contain a restore protocol which takes the necessary steps to bring the system back to normal functionality without any external assistance.
Note: Self Diagnosing automation solution always focuses on "what" part of the deviation from the normal flow, rather than focusing on the "why" part. What deviation occurred and how to bring the system back to the normal flow is what is important. Finding the reasons to 'why' the deviation occurred could be a part of the detailed RCA, that can happen at a later point.
Self Healing automation
A Self healing system is expected to be up and running. The system is expected not to breakdown and should behave normally without any external assistance. It is a system that has the capability to realize that is not working as expected (since it is not generating the expected results), and without any manual intervention it is able to restore itself to normal operation procedure. A self-healing means the ability of systems or environments to detect and resolve problems automatically. It eliminates the need for manual human intervention.
The most straightforward way to make any piece of self-healing automation is to introduce the concept of state. A robust automation program should always be in a specific state when it is executing. The different tasks the automation solution performs are the different objects of that state. Each object has a set of properties & each such property has configurations. When the automation properties are tweaked beyond the permissible configurations, the automation fails.
Steps towards Self-Healing automation solution:
- Use Infrastructure as Code
- Covering all the system states and software code with automated tests
- Deploying the logging and monitoring systems. Leveraging the alerts, triggers, and prescriptive analytics
- Oversee the performance of self-healing automation
Self-healing automation is about adding intelligence to automation solution so that it can heal/repair itself by re-configuring its permissible properties which can fix the issue.
The vast differences that self-healing automation tools bring to the application development is, without a doubt, change the overall approach to automation. Beyond dramatically reducing the downtime and speed of development, engineers could create generic automation scripts that are deployed across various applications, resulting in greater efficiency. In addition to that, the solution would reduce the time and effort needed to update automation scripts after application upgrades, using the same script between different IT environments.