DevOps - Part 3 - A proper QA (SLDC) process

DevOps - Part 3 - A proper QA (SLDC) process

This is part 3 of a?series describing a minimum setup for a modern DevOps culture, for mature organizations that have not yet embraced it.

Software Development Life Cycle (SDLC) process is so tainted by its association with “the waterflow process” that no one speaks of it in polite agile company. We prefer other terms, but SLDC, simply means we have a process from getting software from idea, to implementation, to changes, updates and maintenance. You must have it, name is not important.

This part 3 is about proper SDLC but?we will focus on the part about releasing code changes, so we will really talk about “Change Management” process. But since we’ll also skip a bunch of steps in that process, focusing on getting changed code from developer’s machine to production, is the reason why I am calling this part “Proper Quality Assurance (QA) Process”. But it is critical to recognize that it is nested in the bigger picture of change management and over all SDLC. Wow, what a trip down a rabbit hole … anyway!

One last thing before we get down to business. Please remember, this is NOT a recipe for a modern CI/CD. This is a process for a predictable DevOps of a mature (aka legacy) system in a mature organization, as the introduction explains.

Proper QA Process

To get code from developer’s machine, released to production, in a safe and predictable way, is not difficult, as long as you follow a simple but strict plan. For this, lets call on everyone’s best friend – the flow chart. That term too got bad name as it reminds one too much of flowing water and, well, that reminds us of waterfalls! But we shall rescue this term, and insist that flow charts are not evil and allow us to create very nice diagram that makes people like me look way smarter than I really am. But enough about me. Here is a very simplified flow chart that describes how to get code from Dev to Prod.

No alt text provided for this image

A more detailed flow chart, expanding on this high-level process will soon be in my GitHub.

If you follow this simple process religiously (which does not mean you cannot sin, you just have to admit your sin later and repent) you can be as confident as you can be, that the code you release to production is as well tested, as humanly possible (1).

We start when developers have completed all their work and, using the branching strategy and process we have already discussed, and have merged all code to their main branch.

No alt text provided for this image

The first critical step, that is rarely done, is “Refresh QA [environment]”. This means, that the QA environment, must be “made to be like production” as much as possible. Any code from previous testing, must be wiped off, as well as any setting changes, data changed etc. This has been described in part 2. Here is a diagram for that article for reference:

No alt text provided for this image

I stress, this is a critical part of the process! If you have doubts if this can be done, I assure you, it is very much achievable. It is ugly and boring to implement, that is true, but not difficult. Perhaps I will write an article about this in detail one day, I will just mention a few things: The most important part of the refresh is Data, App Config and Application Code. If you already have automated code release via something like Jenkins or Azure DevOps, you might be fully overwriting code with every release and hence refresh of code may not be necessary (but still recommended). Same situation for App Config, which is more often left to be edited manually, rather than automated. ??Simply put, the closer you are to be able to refresh the QA env from Prod, the less likely you will experience issues in production after a release. Here is a case study that illustrated how a seemingly benign difference can cause havoc.

“Release code to QA”

No alt text provided for this image

This means ‘release all your changes to the QA environment’, following whatever method of releasing you have now. Well, sort of …

If you release code totally manually, you must at least have a step-by-step release how-to, such that releasing is a repeatable and consistent process. Still, manually following the best laid out and documented steps, still leaves a chance for human error, but we’ll have to deal with this in future parts of the series. For now, you must at least have a well document, complex-decisions-free, judgement-calls-free process.

Testing Cycle

No alt text provided for this image

At this point, the Quality Assurance team (QA) can test the system on QA Environment. The type of testing to do, the test plan creation etc., is out of the scope of this article. I assume that the QA team can create a test plan, decide what and how much needs to be tested and execute on that.

Testing, bug fixing and code releases to QA cycle continues until the release / changes are deemed civilized enough, to be let loose in the real world.

There is an optional step (Refresh QA) between each release to QA. If your refresh process is easy, the more often you do it, the better your testing. If you want to dismiss me as overzealous puritan, please consider that multiple releases, one on top of the other, may work differently, than one release of all changes. Defending this assertion would be too long of a diversion - please comment if you’d like an article about this.

It is not in the diagram, but once the changes are deemed “tested”, I always try to do one last refresh-QA + release + quick test. But that may indeed be overzealous of me….

Staging

No alt text provided for this image

Likewise with QA, staging must be refreshed from production. This is even more critical here. Staging is your last step before production. In Staging, we do not test functionality or the changes we are releasing as much, but we are mostly testing the release process. As mentioned before, in QA, we often do many code releases, for the one release we’ll be making to production. This means, that we never really tested the one, full “release bundle”. This we can do on staging. That is the primary purpose of this stage. You may choose to use staging for more than that, depending on your situation. For example, if your QA environment is lacking some integration with external systems that Staging has, you will need to test that on Staging. Perhaps your QA environment is a really scaled down dataset (due to performance or cost or governance) and you are unable to test your changes on some old data. Such situations warrant more testing on staging.

However, no matter what kind of testing you do on staging, there is one cardinal rule; we never, ever, do multiple releases to Staging(2). For example, we never release bug fixes to staging. If a bug is discovered, we go back to QA as our flow chart shows. Why? Because test on staging, has to be a test of the actual release, exactly as it will be done on production(3). You will be tempted to sin here. Understandable. And it is OK to sin a little bit as long as you acknowledge the increase in risk that entails. I recommend you ensure your boss acknowledges that as well. Note that you will be tempted to sin more, if the process of refreshing the environments, and releasing the code is lengthy and/or cumbersome. As you start on this journey, you will sin more than when you complete it. Automation is good. Automation releases you from the shackles of sin. More about that in later parts in this series.

Production Release

Once all testing revealed that the one single release to Staging works, you are ready to release to Prod, either immediately or scheduled in the future.

Post-Production Release

There is one last thing, that I will mention briefly but it really warrants its own article or two. No matter with what kind of zeal you follow the process I have outlined, no matter with what heroic effort you test the changes, there will be occasional problems. If you feel betrayed by this admission, please note that the quest to test and release only perfect software, like UML, is dead. The world moved on from this fool’s quest, as it did from the desire to capture “the business” in one giant, beautiful UML diagram. All organizations that lead the “DevOps culture” initiative, decided to release faster, with a slightly higher risk of bugs, but with post-release checks and readiness to deal with possible issues immediately. Basically, post release, you monitor the systems for any possible issues and investigate immediately. I know this may not be the best approach to nuclear reactor software, but it works in vast majority of industries, even the financial industry.

No alt text provided for this image

For this, you need two basic things – very good error and performance tracking. You must be able to quickly compare the errors and performance to historical trends to spot issues. You must be able to quickly categorize errors, and easily dig deeper into performance metrics if you need to. I will not say more about it now, perhaps will elaborate one day in the future.

No alt text provided for this image

Marry Christmas everyone!

?Footnotes:

(1)??You can always improve your testing using automated testing, I am speaking here only about adding a strict process to the testing you are already doing. ?

(2)??If there is some testing you can only do on staging Staging, you may have to do multiple releases as you try to fix bugs. In this case, simply re-do the final test on staging (with full refresh, and one bundle release) once you are done.

(3)??By “exactly the same” we mean “as close to ‘exactly’ the same as possible”. The close you get to “exactly the same” the better.

要查看或添加评论,请登录

Greg Bala的更多文章

社区洞察

其他会员也浏览了