How validating production deployments could save millions
It is no secret - a bad deployment could potentially upset a large number of customers. Could lose fair amount of business, and most importantly brand is suffered. Moreover, with agile, continuous delivery and ever growing desire to push features/updates continuously, the frequency of deployments have increased drastically. This significantly increases the risk of bad code getting into production and makes it unstable.
While small and frequent deployments are must and good, making sure that customer delight remains same or goes higher post each deployment is equally important
WHY TESTING IN PRODUCTION IS MUST ?
QA,UAT or equivalent environments often aren’t comparable to production environment. That leaves Pre-Production or Staging environment which are designed to be similar to Production. However, having such environment, making sure it would be up-to-date, up and running with proper data, all the time has a running cost and would slow us down too.
This leaves us only the production environment.
“This means we need a precise and concise deployment validation mechanism for every deployment to identify issues faster. This mechanism should be repeatable and reliable.”
WHAT COULD GO WRONG ?
- Deployment issues - Issues with deployment scripts, migration scripts, etc..,
- Human errors - Missing a critical step in deployment, Permissions, firewall and opening up required ports, deploying dependent artifacts in incorrect order, missed configurations etc..
- Unanticipated issues
- Gaps in testing strategy causing critical functional defects
How do we identify these issues ? While some of these can be identified by monitoring tools, logs etc.. Some of them can be identified by running automated tests.
WHAT WE SHOULD TEST ?
Below are some directions for testing. These tests must be executed post every deployment and before we let end customer start consuming the new code.
- Feature or fixes released
- Few end to end flows
- All integration end-points with internal and external systems
- Automated API/Service tests covering breadth of the application and various features
While validation is going on, monitor the systems for the application’s performance, services, hardware and anything else which would help assess the health. Over all this should not take more than 20 minutes of time. If results are satisfactory, we should let the customers start consuming the application and continue monitoring.
INPUTS FOR TESTING
- Know your production environment and deployment process
- Consider existing production data in your test strategy
- Analyse production issues and factor them in test strategy
- Make testing in production a habit
- For every failed production deployment - factor the learnings to your deployment and test strategy