Post Mortem - Death of a test-suite

In most QA departments, there is a test-suite which contains test-cases that are run by the QA team to test a myriad of things like game features, content or 100% playthroughs. Having worked on a live MMO game, these test-cases are regularly used on new content alongside standard tests that are performed with each update. This is to promote quality and confidence in our product and these tests have been honed and tweaked over the years. Your team will no doubt have a similar test-suite but the question I'd like to pose is:

What would happen if all your testcases in your test-suite disappeared?

This was a question I asked at a QA conference in response to a talk centered around handling large test-suites. The orators response centered around their companies IT policy to protect and restore the data but it left me wondering how their department would function should it all hypothetically disappear. I asked this pointed question because this is exactly what happened on a project I worked on. Following is a post-mortem of what happened and the teams response to the situation.

It turned out that our test suite and it's database was not considered a "critical system" so there was no backup

Background

Our test-suite consisted of several hundred individual test cases which where grouped together by a common theme. For instance, if a new weapon was being tested then we'd summon the "New Weapon" template which would populate common test-cases such as it's name, stats, tradeability as well as unique ones to that asset type (Equipping checks, Damage checks). Furthermore, whatever made that weapon unique would be reviewed and specific test-cases would be added to the test-script. These templates where detailed and built up over many years of testing having originally grown from an Excel "master" document.

Over time, we translated the Excel doc into a database and created a bespoke web front end to use that database to auto generate testplans for us. Test-plans created for new systems where integrated into the test-suite for ongoing testing. When a bug went out into the wild, typically a test-case was added to the relevant template to check for it in future. As such, the test-suite ballooned in size.

This test-case focus culture resulted in the QA team spending a lot of time focusing on the Pass/Fail nature of everything. This was of benefit for new starts as we could sit them with a test-script and it was detailed enough for them to hit the ground running. Longer term, however, this resulted in a tick box culture and performance driven by how many test-cases being completed with exploratory and destructive testing taking a back seat.

Day 0

Testers reported trouble in accessing our bespoke centralized test-plan software. The tools developer responsible (who is embedded in our QA team) for it's maintenance was informed and it was clear that the error was on the hosting/networking side so it was escalated to the Sysadmin team. Testers where advised to carry on testing from memory where possible, switching up to destructive and non scripted testing in the meantime.

Day 0 - 6 hours

It became clear that something significant had happened. The machines that hosted both the software and database had become corrupted and needed to be relocated to a new machine. Our developer was given a new server and requested a copy of the backup. Worst case scenario we would have lost that days worth of data.

Day 1

It turned out that our test suite and it's database was not considered a "critical system" so there was no backup. Our developer had a local copy of the software on his machine but not the database. The sys admin team tried to perform a recovery of the corrupted data. The QA team where advised to carry on testing, using their best judgement. They also got together to do some brainstorming and identified areas to focus testing around. Morale was dropping as rumours that the whole database was gone was growing.

Day 2-3

The attempt to recover the database failed and it was confirmed that all the data was lost. The QA leadership group got together to form a plan and it was decided that we'd try to recreate the testsuite from memory. We got the old Excel "master sheet" to help us reference specific tests to include. The software was back online but without the database, the team was still working blind. Some of the newer members struggled whereas the more experienced testers where able to pick up ad-hoc testing quite easily. A lot of them enjoyed the freedom and the break from the admin heavy approach.

Day 4

After an all day meeting to map the high level test-cases we had lost (we agreed to put the detail in later) one of the games updates had gone live. The update itself was stable, testers seemed happier with the admin light approach and in general, the world hadn't stop spinning due to the loss of the test-suite.

The loss also gave us some grace to try a new approach as we could weather any reduction in quality due to having the perfect scapegoat.

This led us to a radical thought "Let's literally use this as a clean slate and re-design it around our ideals, strengths and modernize it for today's challenges". This paradigm shift was welcomed and within a few hours we had a lightweight vision of the test-suite that was minimalist in nature, promoted coverage based on risk and inspired testers to explore their craft.

Day 5

Before the data loss, an average test script for an in-game item would be ~80 test-cases. Now it was down to ~12. The key to this was the change in language. Rather than having a separate testcase to check each stat of a weapon it would be replaced with "Evaluate the stats of the weapon, determine if they fit with the assets purpose and ensure that they reflect the design principles".

This shift ensured that our testers looked at the stats collectively as well as individually and forced them to think about the context of the numbers rather than just checking that the value matches something stated in a Design Doc. The instruction to "Evaluate" gave team members a real sense of of ownership and that they had a real stake in the quality of the area that they where testing.

Day 6

The majority of the revamped test-cases where now entered into the newly secured database which was now being backed up. A new model of the philosophy was documented and some training was given to the new approach and ethos that we where promoting. We monitored updates as they where released and we found that live bugs actually went down as a result of this initiative.

Learnings

We learned a lot during this process and looking back, it was a blessing. Trying to effect change on entrenched processes is very difficult to pull off with transitional periods being awkward to navigate on their own. The loss of the database made the idea seem less risky than the alternative of a long winded restoration from memory that would never quite be the same as before. The loss also gave us some grace to try this new approach as we could weather any reduction in quality as we had the perfect scapegoat.

The incident also showed that QA artifacts such as test-plans are important assets that should require the same safeguarding as other business critical services and that we should have stronger plans in place for worst case scenarios. The most important learning was how well the team performed and adapted to the new shift in testing philosophy. Another outcome was that we now knew the reasoning behind executing the tests as we are now more likely to fix the root cause of issues and no longer having to check it forevermore, "just incase".

Daniel Harvey

Manual QA Tester at WorldPay

6 年

Exciting times to be sure, and I was impressed at how well-handled the situation was by both management and the team. I feel like it was a strong turning point in our move into a more progressive team and it's odd to think that without the curve-ball of the data failure we may have never had the initiative or courage to make such a radical change in our QA ethos.

Tom Sharman

Senior Software Test Engineer at VNC Automotive

6 年

I remember this happening. It was a crazy time but like you mentioned, we all benefitted from the whole ordeal :)

回复

要查看或添加评论,请登录

Alexander D.的更多文章

  • 5 signs that you're compromising your approach to Quality

    5 signs that you're compromising your approach to Quality

    Is the game that your working on got that general buggy feeling? Are you waiting for that polish period towards the end…

    1 条评论
  • QA and our flirtation with Programming

    QA and our flirtation with Programming

    Where does our pre-occupation of desiring programming experience in our a QA testers come from? There has been a rising…

    1 条评论
  • Welcome to QA, you’ll go far if you leave soon

    Welcome to QA, you’ll go far if you leave soon

    Typically, you'll know of a bright eyed, hardworking individual who is looking to get a start in the Games Industry…

    1 条评论
  • Visions of a QA degree in Video Games

    Visions of a QA degree in Video Games

    I've worked in QA for almost 10 years and would say that QA is starting to be viewed as a discipline in its own right…

  • QA - Defect Count is Defunct

    QA - Defect Count is Defunct

    We all like metrics as a way to measure performance and measure trends. When it comes to checking the performance of…

  • Do we even need QA at the start of a Project?

    Do we even need QA at the start of a Project?

    Its a mantra thats been spoken across the breadth of QA in bated breath. If we were involved at the start of this…

社区洞察

其他会员也浏览了