Quality and Agile - Why and How we Should Compromise and do it Safely

Quality and Agile - Why and How we Should Compromise and do it Safely

Quality is a funny beast, we all think we want it, but often don’t know what it is, or more importantly, how to get it. We sure do feel its absence though, in every piece of equipment, bite of food or downloaded application we encounter that has somehow ended up being “low-quality”. In a world where everybody seemingly has everything, we've often allowed low-quality to be the lever for lowering the entry bar for that experience.

Introduction

“The bitterness of poor quality remains long after the sweetness of low price is forgotten.” – Benjamin Franklin

In this article, I attempt to expand upon some recent discussions with colleagues and my own thoughts on quality in our industry in general. However, I’m not the pope of QualityTown, these are my opinions, which are there to be challenged, as challenging my stance on quality is what precipitated this article in the first place.

But, before we start, if there’s one thing I hope we can agree on, is that we should be talking about quality, as good quality rarely happens by accident, but sadly I suspect poor quality does (well, I have experience that confirms my bias towards that statement).

Broken Windows

No alt text provided for this image

There’s a criminology theory, known as "The Broken Windows Theory" (https://www.tutor2u.net/sociology/reference/broken-windows-theory-explained) that reasons “as soon as there’s a single broken window that goes un-repaired, other minor infractions will start to happen”. This is based on the idea that if people see an area is in disrepair or has lax enforcement of low-level crime, people will stop caring about it and eventually, start making things worse through laziness or malice.

In our industry, as a technical lead/architect, your name is on the shop-front for many projects and even as a regular developer, your name is against code, yet this problem is pervasive. Which is almost counter-intuitive, as people, in general, don’t like doing “a bad job”, it’s demoralising and worse, people might see those broken windows, *deep sigh*, and break a window themselves in order to leave the office on time.

We Must NEVER Compromise on Quality!

One of my core mantras has long been “All code is production code” (so we include test and platform code in our definition of quality, which is no bad thing), however it’s a little simplistic, so I’ve revised it to a less punchy “All code intended for use on a system that will be used in production should be the same quality as production code”. If you’re writing a Proof of Concept (PoC), you don’t necessarily need so much polish, because PoC code never makes it into production (yes, yes, I know your sides are hurting... I’m here all night, try the fish).

The classic triad included in many Project Management textbooks of a certain vintage showed “Time, Cost, Quality” where, as with all triads, you had to choose two of the three, where the centre of the triangle was the scope. In my opinion, software delivery should really use the more modern version of “Time, Cost, Scope” where the centre of the triangle is quality.

No alt text provided for this image

So, with the above in mind, I was fairly fixed in my mind’s eye on what good looked like and that we shouldn’t be deviating from that level of quality. Fast forward to a few days ago, a few months into working with some colleagues on the topic of quality and during one illuminating meeting, in my hubris, I exclaimed “we must never compromise on quality!”, which set feathers a flying as we discussed various scenarios, with one pertinent example being “well, not everybody wants a Rolls Royce solution”. My colleague was quite right, not everybody wants or needs that level of solution, but is a VW Golf low quality? And if so, which bits of it make it low or high quality, good or bad?

At that point, feeling the collective tension in the room, one of my other colleagues then piped up gnomically “but what is quality?”… Good question. That one stopped me in my tracks.

But What is Quality?

Therein lies the reason we were getting all excited, we hadn’t agreed a frame of reference and were talking cross-purposes. Going back to the car analogy, Rolls Royce are quality through and through, but Uncle Gertrude just wants his VW Golf to take him to and from the fetish club on a Sunday. So, when we say we don’t want a Rolls Royce solution, which bits don’t we want? This is where the parallel is particularly useful…

  • We might not want such a slick finish (luxury, UEX, value-add features)
  • We might not need its engine power (performance, response time)
  • We would really like its reliability (availability, reliability)
  • We need most of its functionality (functional requirements)
  • We need some of its seat space (capacity, scalability)
  • We need some of its boot space, foldable seats, tow-bar (flexibility, extensibility)
  • But, we definitely need its safety (internal and external safety)

So, off the back of that, we all-but shook out a broad set of Functional and Non-Functional requirements that don’t look too unfamiliar if you’ve worked in software delivery for a few times around the sun.

But the analogy above is still over-simplified; it goes further than that as even a Rolls Royce doesn’t have the functionality of a pick-up truck or a motor-home, so the Rolls Royce solution is only the optimal one in certain circumstances. Further to that, nobody wants a £300,000 pickup truck, so there’s no point in adding useless luxury to one e.g. you have a back-office system, it’s not exposed to the public so there’s no point in making it look and feel like the latest and sexiest public website, it’s just there to let you record your paperclip usage for the week.

Also, some cars are favoured because they’re really nice to work on, so the maintainability of your solution is also a consideration. If it is expensive in time and/or cost to maintain, it should be by choice, not by accident (I'm looking at you, Apple). You might want to build something that is slow to change (including compliance/regulatory changes) or never going to change in a way that isn’t very extensible and that's a calculated restraint shown in your engineering.

How we Judge Quality

Martin Fowler has much to say about how quality is perceived and thus judged https://martinfowler.com/bliki/TradableQualityHypothesis.html

In the above, Martin distinguishes between two broad “quality types”, namely internal and external quality. Your users see the external quality, your developers, tester and operations see the internal quality.

Interestingly, the medical world also makes similar distinctions with "functional" versus "technical" quality in surgery, which is discussed here: https://academic.oup.com/asj/article/32/6/751/2802432

Further to that, above we showed that there are many types of quality (availability, scalability, functionality etc.) and in order to show we’re exhibiting each of those “qualities”, we need some kind of measure, because “safe” is not testable on its own as a statement and you can’t prove a negative, so logically you can’t prove a 100% absence of safety issues, you can just check on a scale of known measures and techniques.

This is why we have different types and layers of testing, this sounds obvious, but it really helps when you’re planning your technical quality and testing approach to pin one or more qualities to your testing methods/types/layers, because if you blur them, it can end up not providing the right value or cluttering up your test space.

How do we End up with Low Quality?

No alt text provided for this image

There are broadly two types of quality problem, intentional and unintentional. If it’s unintentional, it’s an education/haste/laziness issue, but if it’s intentional, you’re being forced to choose, but what’s forcing you to choose?

There are always gating factors for any development project, but what they are helps determine the best course of action. “Very early in a project plan, you identify if the project is constrained by time or by resources,” says Doug White, a New Jersey-based business intelligence and solutions architect. “For time-constrained projects, you continue the decision tree based on risk of failure, penalty of missing the deadline, and priority of this project. For resource-constrained projects, explore the cause of the constraint (builders’ availability, buyers’ ability to fund – to build and later own), and also, the priority and opportunity cost of significant delay.”

However, much of poor quality comes from neglect, and that isn’t a compromise or choice. Compromise is being talked down from going to the two starred L’Enclume for dinner, rather consciously choosing that nice Thai restaurant around the corner. Neglect is sticking a pin in the Yellow Pages and hoping you didn’t just commit to going to B&Q for dinner.

“Be a yardstick of quality. Some people aren’t used to an environment where excellence is expected.” – Steve Jobs 

What is Sufficient Quality?

“It’s more about good enough than it is about right or wrong.” – James Bach

Sufficient quality is getting the VW Golf you paid for, at the agreed price, the timing is less important in the case of a car because you wouldn’t release a car that was knowingly broken, as recalls are expensive.

No alt text provided for this image

As a car buyer, you’d be upset if VW came back before delivery asking for another £5k because they took the engine into the lab for sports tuning and reduced the weight by 150kg using carbon fibre parts. You’re then paying for something you didn’t ask for, or indeed want. When an engineer starts gold-plating a solution, they’re fitting carbon fibre parts, but what’s worse, with deadlines they might be unwittingly doing that at the expense of working seatbelts.

The old Road Fund License (tax disc) system in England was a good case. It was ugly, 1995 ugly, it had a face only a mother could love. However, it was fast and very simple, you got the job done and didn’t have to look at it for another year. I once renewed in about three minutes flat, therefore what it lacked in looks, it made up for in the important aspects of quality. Being so simple, it was accessible, even to those on very old browsers or screen-readers.

Also, as engineers, we fetishise building ultimately modular and reusable systems, but just because something isn’t infinitely configurable and extensible, doesn’t make it poor quality. You cannot fit a pickup flatbed to a Rolls Royce without significant rework and probably making it look really weird (software isn't "Pimp my Ride after all). This means, as much as it pains us to do so, we should also think of the YAGNI aspects the next time a client wants to compromise on “quality” yet wants a full workflow engine designer GUI etc. Having a good idea on where you’re very likely to go with the platform is much more helpful than hopes and dreams. Build the stuff you need with the immediate future in mind and be prepared to dispose of bits if you substantially change tack.

As a project team, you need to have a Technical Quality Plan that describes what sufficient quality means and that needs to be something measurable, not subjective. But crucially, at some point, you should be more granular than that as an admin interface is probably less important to you than your public-facing website, so the availability and usability quality bars may be different.

But, as discussed earlier, there’s one of the qualities that we can’t compromise on, the one that will kill careers, the industry, bank accounts and some times lives.... safety.

What we Really Care About is Safety

There's an old saying that if you think safety is expensive, try an accident. Accidents cost a lot of money. And, not only in damage to plant and in claims for injury, but also in the loss of the company's reputation. - Dr Trevor Kletz
No alt text provided for this image

Safety, it’s the one quality we shouldn’t compromise on, and in that respect, there’s two types of safety: safety features that protect your assets and safety features that protect your customers’/users’ assets. The parallel here is having a good airbag is internal safety, having a design that doesn’t impale people when it crashes (like the old Jaguar ornaments before they folded away) is external safety. However, as we're in business, we should value external safety highly, and I would hope that the value chain of your organization looks something like:

No alt text provided for this image

By the way, if you put your pay-rise ahead of a human life, I don’t think there’s any help for you, and you should definitely consider a career in politics.

That sounds quite complex, but when you break it down, most of the time, safety is largely about data, broadly encapsulating:

  • Integrity of Data
  • Security/Confidentiality of Data
  • Availability of Data

The “Data” part can be extended to “Service” if that service’s timely availability means a preservation of life and/or assets. For more on this, I’d suggest you read up on the InfoSec CIA triad, which is a very similar concept. https://resources.infosecinstitute.com/cia-triad/#gref

Being a small or cheap service used to be an excuse, but the old adage of “we’re just developing website X for a service cheaply” stopped working when new GDPR regulations came into effect. It doesn’t really matter if it’s a small, cheap website, because if you lose 1,000 users’ personal information, you are in the doo-doo. If you don’t believe me, have a look at the fine structure for GDPR violations.

Why we Should REALLY Care About Quality, Especially Safety

We’ve mentioned fines already, these should be reason enough for companies to care about quality. Another example is food… You can choose to serve all kinds of food of varying “quality”, but you don’t play fast and loose with ingredient contamination, especially allergens, or as was seen in the news recently, you go to jail.

It’s too easy to capitulate to demands for “no unit tests” or “just get it out of the door”, but when that involves a safety concern, you are acting unwillingly negligently or if you have really thought about it, downright unethically.

In no other industry would that kind of compromise be countenanced, let alone passed through into live usage. If somebody built a new luxury office block and was running tight on budget or time, the matter of nice trim in the executive bathrooms might be in question but nobody would be allowed to save money by painting on fake fire exits! This is why we have buildings regulations.

As the industry has geometrically expanded, we are in a weird position where we’re building things in the Digital domain that are the most complex of all human endeavours, but we’re subject to the least regulation, less than the people who make your 3am kebab.

Scary, isn’t it? Scary not just because things could go wrong, but that the splashback damage from a major incident in another organisation could stop our entire industry until we’ve picked up the pieces and put controls in place to prevent it from happening again.

When is Something Safe or Unsafe?

It would be easy to make blanket statements, like "it can't be safe it the testing isn't automated", but the matter is more subtle than how you test.

Just because something is unmaintainable doesn’t make it unsafe by default, if it has the right level of automated assurance around it, it can be incredibly safe. Further to that, just because something has no automated assurance around it doesn’t make it unsafe by default if you’re willing to spend the time and money on retesting it (the ethics around cyclical, expensive manual testing is for another time).

However, if timeliness and availability of the service is safety critical, it cannot be unmaintainable or slow to retest, or by definition it becomes unsafe, as you’re just rolling the dice and hoping you don’t have to change it. That’s like having a really hard to maintain snowmobile, not enough resources to fix it and then going out to the South Pole without a radio, you’re just hoping you don’t break down, because if you do, you’re in trouble.

However, that doesn’t mean we should build something that’s "too safe", because that's a waste of resources and can harm usability. A computer sealed in six metres of reinforced concrete lobbed into the Mariana Trench is likely very safe from cyber attack, down there in the Spongebob department there's no 4G or broadband. It’s not particularly usable though, so you have to know when something is "safe enough".

The NHS have quite a robust approach to safety, by having separate assessments for Clinical Safety matters, which is any system that involves a patient’s access to care or their data. All issues are examined and rated, if there is no potential for a clinical impact, they pass it on by, no matter what the bug is. They have a very solid frame of reference and they stick to it, meaning that minor or very intermittent bugs might not get past review if they would result in an “absence of care”. I like that way of working a lot, even if it initially feels very cumbersome.

How to Build Sufficient Quality into Your Programme

In my opinion, in many cases when we say we’re going to compromise on quality, it’s much healthier to think of it as a compromise in scope and strip back the luxury, performance, availability, functionality in a documented and deliberate way. You just don’t touch that sacred one of Safety.

I've harped on a lot so far, but what can you do about it on your programme to try to protect the qualities of the system that are important to you, your client and your users? Here's a few ideas...

Choose Where to Apply Quality and its Measures

Firstly, let's be clear, a metric isn't "quality", numbers in themselves do not represent quality or convey too much meaning, it's the context surrounding them that's useful. So, slapping a blanket 80% test code coverage rule on your project is not only naive, it can be dangerous as people can game the system.

If you've done enough analysis, you should be able to break your system down into functional areas, which should have some idea of priority when it comes to safety. Where you have a separate area with its own quality levels, you should have a set of measures. Being able to automate much of this using things like SonarQube reduces the cost-overhead of monitoring quality levels.

Do not just apply blanket quality measures across the whole programme, it makes no logical sense and makes it harder to prioritise remedial action later. Call out your quality issues at the component/service level. There will be hotspots, trust me, and they're often around complexity or poor team dynamics.

Track Your Quality and its Compromises

“Quality is not an act, it is a habit.”— Aristotle

In addition to measuring quality, which we discussed earlier, we need to track it and any decisions we’ve made.

You should workshop your quality and the levels you're agreeing to, also consider what might happen if your quality compromises or agreed levels result in a problem:

  • Reduction in UI quality -> User goes elsewhere
  • Reduction in availability -> Business user has to enter details in later
  • Reduction in maintainability -> New feature takes a long time to market, reducing potential turnover
  • Reduction in data security -> Massive fines, possible closure of business
  • Reduction in clinical safety -> Somebody dies, somebody else might go to jail

And by being deliberate about segregating out our areas in the system by safety levels, we can be smarter about our quality measures. We could have 40% coverage over 10m lines of stinky code much of which is unused or unimportant or we could have 90% coverage over the really important bits and let the other bits languish.

Any significant technical choices made should also recorded using a Key Design Decision document (which is effectively just a problem statement, available options, the chosen option and rationale).

Record all of your compromises in your project’s Technical RAID log and ensure that it’s curated frequently, ensuring transparency around those risks maturing and materialising as issues. A forgotten RAID log is useless and potentially dangerous as bad information is misleading.

Design and Build with Testability in Mind

An automated suite of tests is a repeatable proof that the items in your test pack haven't changed in outcome since you last executed them (whether or not it's "correct" is another matter). This means you really do want to think about how you can:

  • Test the individual units of your system
  • Test parts/components of your system (vertical)
  • Test layers of your system (horizontal)
  • Test the non-functionals
  • Test without your networked dependencies (networked components add a lot of time to test execution)
  • Test on multiple environments without excessive build times or licensing implications

Many Commercial Off the Shelf (CoTS) products are difficult to test around, so choose wisely. Baking testability in to your solution is one of the best ways of avoiding to make quality compromises, as when stakeholders see a struggling automation team, the knee-jerk reaction is to move back to manual testing.

Of course, then you have to ensure the quality of your testing artefacts and frameworks, which can feel very "spendy" to a client. This is where being a good technical salesperson comes in very handy.

Manage Your Technical Debt

There’s a strong correlation between a project with high levels of Technical Debt and other functional or non-functional quality issues. The reverse is also true, but why? In my opinion, it’s fairly obvious that a system that is resistant to change due to a rotting codebase or design debt is harder to code around, more prone to side-effects, less likely to have been thoroughly unit tested. If the system was in pristine technical shape, with great test coverage, bugs would naturally be scarce and also quicker and easier to fix.

With that in mind, encourage a mindset of noticing, recording, triaging, curating and fixing technical debt. Building a technical debt budget into your sprints is a smart idea, otherwise you might have to create a business case to fix every piece.

Avoid Compromising on Quality in the First Place

The smartest way to ensure that you don't have to make painful compromises is by avoiding having to compromise.

  • Do a better job of estimating in the first place from the bid level onwards - Shooting yourself in the foot by forgetting to include enough time and cost for adequate quality is an own-goal
  • Better still, don't try to do a "big a** estimate" of stuff you can't possibly guess that far into the future. Listen to the Scrum overlords, deliver against a prioritised backlog, not to an end-date
  • Use off-the-shelf libraries or solutions - Don't reinvent the wheel, other people have tested existing software for you
  • Let it be ugly/utilitarian - If that's the requirement, you can pretty things up later, but poor design is much harder to fix
  • Avoid unnecessary quality - By putting a Technical Quality Plan in place to state your intent on where you’ll be spending your efforts
  • Reduce scope - This could be features, non-functional requirements or having a longer phased-rollout to minimise impact

Build your VW Golf using Rolls Royce Tools

Just because the client doesn’t want a “Rolls Royce solution”, it doesn’t mean it can’t be built and tested using top of the line production tools and techniques. Having an agreed set of tools and ways of working cuts out “frameworking” time where you write a lot of code that never sees production for build and test.

This doesn't mean you need a standard technology stack or product for the live service, as most testing frameworks and tools don't really care what the target architecture is. Just be sure not to allow teams to hand-roll stuff where other people have already solved a the problem.

Test and framework code should be the same quality as production code, so why not reuse these enabling functions between projects? It means you’ll have more time for production quality later.

Know When to Say “No More”

Sometimes, imprudent compromises result in further compromises due to design debt. At some point you might need to pull the plug and say “no more” and you will need an advocate for that because sometimes we’ve got to save people from themselves and you need evidence and strength in numbers to have those difficult conversations.

However, quality, especially safety shouldn’t be an emotive argument, it should be backed up with concrete reasoning. There is a Japanese word for it Jidoka that roughly translates to "Stop the line". It is used in manufacturing terms but is applicable here (https://kanbanize.com/blog/stop-the-line/).

Work to Current Industry Standards and Best Practices

Where "Current" is the most important word in the title above. Software quality and safety practices and frameworks move almost as quickly as IT in general, so advice you may have been given five years ago might actually be damaging in the current climate.

There are several Secure Coding and Secure Software Delivery Lifecycle (SSDLC) frameworks that are mature, have assessment criteria and also offer training. Some of these (such as Cyber Essentials) have been adopted by the public sector, so it's worth keeping abreast of the latest practices.

Ensure you and your team have enough knowledge and training between you (with no single point of failure) to ensure that when you start to talk about quality, safety and security, you're well informed.

Encourage a Quality Mindset in your Team

“Quality means doing it right even when no one is looking.” – Henry Ford

The Broken Windows Theory was pretty brutal when it came to dealing with infractions:

The idea was that low-level crime should not be tolerated and severe penalties needed to be meted out for anti-social behaviour and minor incivilities in order to deter more serious crime.

I'm a much more of a carrot than a stick person, however the aims were sensible and translate well into the world of quality:

  • Collective conscience – shared views and values on quality
  • Social solidarity – Cohesiveness of the team, agreed standards and rules
  • Maintenance of boundaries – Understanding of when to say no, and where the team's concerns lie

We should build a mindset, a belief-system, a religion around:

  • Being good boy-scouts by leaving the campground in a better state that we found it
  • Showing restraint by knowing what good enough is for each area of the system
  • Knowing where to compromise and where not to when we need to make a decision
  • Feeling empowered that we can whistleblow if we’re not sticking to (or being forced by external parties to deviate from) our Technical Quality Plan
  • Confidence that we can escalate with the assurance that we can “stop the line” if it is required (this is a last-resort though)

Build these into your team through roadshowing your Technical Quality Plan, expectations, practices and toolsets, because most engineers like to feel proud of their work.

There is an argument that Static Code Analysis (SCA) should be picking up much of this for you, but I agree with my esteemed colleague when he said:

[It's] important that we use tooling to reinforce rather than replace personal responsibility for doing the right thing" - Mark Pullan

Yes, use tooling such as SonarQube, it's great, but it doesn't replace high-quality team who are bought into the idea of what good looks like. Intrinsic quality in your code-base transcends any quality metrics you could ever dream up, but as systems get more complex in both function and scale, holding on to intrinsic quality by running a tight ship becomes more difficult.

Closing Thoughts

No alt text provided for this image

If there’s one thing I’ve learned I that actions speak louder than words, so if we want quality, we have to really fight for the quality that matters and be prepared to challenge people when they cross the line or they pay lip-service to the idea of quality. In rethinking “quality” and putting the focus on safety, the argument “this system is unsafe for the following reasons…” might have more traction than “the quality of this system is low because of code coverage”.

“Well done is better than well said.” – Benjamin Franklin

Even if the quality compromise is not safety critical, it’s always worth thinking, "will this do our reputation harm as implementers?" and "is it something I’m happy to put my name personally against?", if the answers to those are yes and no respectively, it might be time to go back for another round of fighting.

So we should:

  • Understand what your quality types and values are
  • Understand your measures, and the context around them, not just a raw number
  • Log your compromises as risks and track them
  • Choose where to compromise first
  • Do not allow your system to become unsafe either by choice or by neglect
  • Use good tools, techniques and processes to avoid making compromises in the first place
  • Learn and disseminate current industry best practices and ways of working
  • Ensuring you’re sending the right kind of message to your team and clients

Finally, if we do not start thinking about this more deliberately, Uncle Bob’s prediction that we will be regulated as an industry and many people will go to jail will come true. In fact, it’ll likely come true anyway, but don’t be the cautionary tale that made it happen, because they don’t have Fortnite and Domino’s pizza in jail or at the dole office.

Pravin Uttarwar

CTO & Co-Founder - Mindbowser | Driving Healthcare Interoperability & Quality Product Development for Innovative Solutions

5 年

Great writeup! I would add automated tools like Codegrip to check the code quality of developers which is written daily, this will help in long term to maintain the code properly and change the in future if needed. With no efforts you can do this easily.

回复
Michael Fordham

Head of Cloud Service @ BJSS

5 年

Lots to ponder in that article! I often find that quality is assumed as a result of process rather than actively managed as a 'thing', and the true cost of safeguarding quality is rarely factored into delivery.? That doesn't excuse being a good citizen with your efforts to maintain it, but when push comes to shove the industry often values delivery over quality and employees are usually measured in contribution that way too.?? I also think that quality is a bit like culture - attempts to deliberately manufacture it (i.e. process) usually fall short, it has to be the sum total of lots of different things, some intangible.

Mark Pullan

Principal Architect at BJSS

5 年

Good article. One thing I ponder is the distinction between Intrinsic Quality and Measures of Quality. I've seen (certainly in my earlier days) high-quality systems put together by small, expert teams that wouldn't pass muster by modern _measures_ of quality (no unit tests, for example).? A common theme, though, was a highly-knowledgeable team with a cast-iron view of what good looked like and a high degree of self-discipline / self-policing. Important that we use tooling to reinforce rather than replace personal responsibility for doing the right thing (where, as you highlight, the right thing is often contextual).

Joseph Ward

Regional Practice Lead for Testing (South) at BJSS

5 年

Great post. The only thing I would add is that often a key, but overlooked, aspect of "baking quality in" is thinking about testability. If something's difficult to test then it won't get tested, if something's a struggle to make testable it may not have been built in the right way.?

要查看或添加评论,请登录

Jeff Watkins的更多文章

社区洞察

其他会员也浏览了