The Importance of Negative Acceptance Criteria

Story time

Senior dev Sanjay worked for Agicorp. Agicorp was great, their development practices were modern and up to date, they had automated tests connected to their build pipeline, everything was verified as it went into the main branch, and they set up daily pushes of releasable builds based on their full automated test suite. Their technical and product functions were well integrated, so the devs got a good feedback loop with stakeholders, and could deliver value quickly and iteratively.

(Before we get too far into this I should put in this disclaimer, that Agicorp is made up and this story is fictional.)

Agicorp offered a back end solution as a web service. One day, they decided it needed a user-facing SaaS offering - a web UI for their service. Since it was a well run company on board with current trends, they wanted to build it in a product-led agile manner with good test coverage from the outset. It went well, they delivered value and got users on board, the system was active and there were no problems.

Until one day, a customer support ticket arrived: some data was going missing. Then, the customer found themselves locked out of their account entirely. Everyone relevant was brought in to investigate, including Sanjay.

What happened?

That was the first question, of course. Ollie, over in operations with access to the real system, sat down with Sanjay to look at the system's audit records. They showed that the data removal and account deletion had been done by a user "mike_119". A quick lookup against the users data store found a mike_119 as the owner of a recently created free tier account, apparently unrelated to the customer who raised the ticket. This user had logged in at the same time as the deletions. When Ollie brought up the web server logs all the delete requests to their back end API were clearly visible. It seemed like someone was using the admin pages, but of course it's hard to tell from such coarse grained logs exactly who or what.

"Oh no, that's not good", said Sanjay with classic developer understatement.

They took the system down to maintenance mode while he got together with developer David to work out how this could be possible. It turned out that although their interface and APIs were protected by authentication and session management, the API endpoints didn't properly check the role or account. If you were logged in to your own account and you knew information about another, you could swap your session to point to it, and keep your roles. If you were an admin on his free account so he could assume admin permissions on any other account if he knew their ID.

Turns out there was a Mike who had recently left the customer under bad terms, and he was senior enough he used Agicorp's service before leaving and took the account ID away with him. He made a free tier account so he could assume those permissions on his old company's account to mess with them.

Post-incident

Once they discovered the problem, the engineering team had to do a high pressure, quick turnaround patch to their entire API to make this kind of attack not be possible. It took Agicorp's service down for days, losing them significant reputation and business, and caused a lot of stress for everyone involved.

After the fire died down, the question began to be asked: how did we get here? don't we have full test coverage? And that's where we rejoin them.

What did they do wrong?

Agicorp was product and test driven. They engaged with the stakeholders, constructed stories with BDD acceptance criteria, and then wrote BDD-style system tests and unit and integration tests at the code level. Let's take a look at some simple stories for the first few pages in their application.

User can log in

As a normal user I want to be able to log in and browse the inventory.

Technical:
Create a login page that accepts a user name and password. If they are filled in correctly, the user should be able to log in be redirected to the inventory page.

Acceptance:

- Given I have a valid user account
- When I go to the app root
- Then I am presented with a login page with user, password and a log in button

- Given I am on the login page
- When I enter my correct credentials
- Then I am redirected to the inventory page        
Admin user can manage account

As an administrative user I want to be able to log in and manage the account.

Technical:
Update the login process to include user roles. If the logged in user is an admin then show an option on the nav menu to go to the management dashboard.

Acceptance:

- Given I have a valid admin level user account
- And I am on the inventory page after logging in
- When I click the "management dashboard" item on the menu
- Then I am shown the account management dashboard        

Those acceptance criteria were written up as executable BDD tests that were run automatically. We can imagine that they wrote some integration tests against their authentication provider as well:

@Test user_can_log_in() {
  testAuthenticator.seedUser('user1', 'password1');
  userResponse = authService.logIn('user1', 'password1');
  verifyValid(userResponse);
}

@Test admin_can_log_in() {
  testAuthenticator.seedUser('admin', 'password1', [ Role.Admin ] );
  userResponse = authService.logIn('admin', 'password1');
  verifyValid(userResponse);
  verifyHasRoles(userResponse, [ Role.Admin ]);
}        

All of these tests were green every day or every commit, the page looked great and worked as intended during their sprint review. So what's wrong with this?

Well, let's imagine the simplest solution we can write that passes all of these tests.

class AuthService {
  logIn(String user, String password) : UserResponse {
    return { valid: true, roles: [ Role.User, Role.Admin ] };
  }
}        

Now the problem should be obvious!

Negative Criteria

Agicorp's problem is that all of their test criteria are positive, i.e. they are testing that the behaviour that you want does happen when you are in the appropriate state. They have no negative criteria, i.e. they are not testing that the behaviour you specified doesn't happen when you aren't in an appropriate state.

In the worked example above, they didn't test that you didn't get admin role if you shouldn't. They didn't even test that you couldn't log in if your credentials were wrong!

I would hope that our fictional Agicorp wouldn't have been dumb enough to make this exact mistake. But they did make a similar mistake in their management page, not checking that users don't have access to resources if they are on the wrong tenant.

Important Types of Negative Criterion

There are a few different types of thing Agicorp should have thought of:

  • Authentication and authorisation: as in the simple example above, check that the user is correctly not allowed to see pages or contents that they're not supposed to, if they aren't logged in or don't have a functional role or permission. In the context of a single page web app, this includes clicking around, and also directly going to the URL.
  • Is the resource in the correct account, or tenant, or associated with the right user or other resource? This is the one Agicorp forgot. There are many varieties of this, and not all are as obvious as checking a tenant or account field on a user: is a user trying to attach a cloud resource to a locally hosted environment? is the owner of the resource you're trying to edit in a group which has removed edit permissions?
  • Invalid data. Some injection type attacks are due to validation not being applied, and invalid data then being fed into internal services which assume that it's valid. You should test that invalid data gives you the appropriate error handling. Depending on what you want to do with injection attempts they may not really be a negative criterion (if you want to escape and store the text) but they are related to validation scenarios.
  • Unavailability of external resources. If you rely on a third party service (and this includes things like cloud or even network storage), what will your application do if that service is down?
  • Direct access to internal services. In Agicorp's case they had a web front end that calls REST APIs - they need to have negative tests against those APIs, not just their UI. This applies to anything which is accessible to the user - are you posting messages directly to Amazon SQS? can the user upload content to an Azure storage container? does your system accept push notifications? then you had better have negative acceptance criteria against those things to check that messages or content don't get pushed if they aren't supposed to.

Aspects of a Negative Test

The essence of a negative test is "the system fails when you put it in an invalid state". But there's more to it than that: what does failing under control mean? You should think about

  • Response codes. If it's a REST API, that means using the appropriate HTTP error codes. If it's a component library, that means throwing well documented exceptions. In a desktop or mobile app maybe it's a message box or alert with the right icon and button choice.
  • What the user should see. Do you need to return an internationalisable message key? Is the internal message helpful? Does it contain information you want to hide from the user (because it could show internal server state)? How should the user be informed of the failure? (An alert or popup if they initiated it? A status line notification? A change of icon? Something on a system control panel? It depends on what the error condition is.)
  • What to log. Good application logs are critical for working out what happened on a live system. Do you want an exception trace? (Probably not for validation or authentication conditions, but for attempting to process invalid input, maybe. For unexpected failures, definitely.) How much context about the user or request chain that led us to this position should be logged.
  • What to record in permanent audit. Is it a failure condition that is potentially malicious that you want to record? Should future similar actions be restricted (e.g. account lockout for failed authentication or for repeatedly trying to access admin resources if you aren't one)?

Do I Always Need Negative Criteria?

Anything but the simplest of user stories will have a way the user could do it wrong. So yes, you should always consider negative criteria. Almost every story will have at least "the user tries to access this without permissions", and anything where the user can provide content (in a text box, uploading a file or referencing a URL) needs to consider invalid user input.

Don't be Agicorp

I know some of you will read the top and then skip to the bottom :-) so here are the key actions you can take to make sure you include negative criteria:

  • Include it in your Definition of Ready - stories must consider common negative criteria in breakdown or story definition, and must define what permissions are needed for the functionality they describe
  • Always try to think "how would I break this" and make sure you have those attempts covered
  • Run your positive tests, especially manual verification, with the least privilege you can - i.e. use a normal user, not an admin user, if you aren't trying to show admin things. If you're only modifying system config then use a user that only has system config permission, not permission to view content. This way you will see things and raise the question "am I supposed to be able to see that?"

要查看或添加评论,请登录

Richard Smith的更多文章

  • How I Do Agile - 6. Code Review

    How I Do Agile - 6. Code Review

    Earlier in the series, I was writing about the human side of software development - roles within a team, processes and…

  • How I Do Agile - 5. Branch and Merge Approaches

    How I Do Agile - 5. Branch and Merge Approaches

    So far in this series I've talked about guidelines for the human side of development - what roles you need on your…

    4 条评论
  • Digital Preservation - What is it and why do you need it?

    Digital Preservation - What is it and why do you need it?

    Since starting work for Preservica back in 2015 I have lots of conversations that go like this: "What do you do"? "I'm…

  • Isn't hexagonal architecture just 3 tier in a new dress?

    Isn't hexagonal architecture just 3 tier in a new dress?

    Hexagonal (or "ports and adapters") architecture is one of the currently popular architectural patterns. Here, I'll…

  • How I Do Agile - 4. Testing

    How I Do Agile - 4. Testing

    Testing is one of the most important parts of what we do. If we don't test our work, how can we say it's good enough to…

    2 条评论
  • How to Have a Meeting

    How to Have a Meeting

    Meetings can be important, and we all need to have some, but they are also time not spent on producing outputs (whether…

    1 条评论
  • How I Do Agile - 3. Project Tracking

    How I Do Agile - 3. Project Tracking

    Last time I talked about how to make good stories, so my team can work effectively and move a project forward. By…

  • How I Do Agile - 2. Story Creation

    How I Do Agile - 2. Story Creation

    In this second article I will give some insight into how I've found story creation and development to be done…

  • How I Do Agile - 1. People and Process

    How I Do Agile - 1. People and Process

    This is the first article covering particular areas of my experience in software engineering. As I said in the…

  • How I Do Agile - Series Introduction

    How I Do Agile - Series Introduction

    Welcome to the first in a series of articles in which I will share my experiences of software development and…

社区洞察

其他会员也浏览了