It is in the nature of software to be broken

It is in the nature of software to be broken

‘Why don’t they just write the code properly? Why can’t they just get it right first time?’

I’ve heard these sentences spoken several times in a professional context by non-technical project managers wondering why the testing phase of a project (or a sprint, or a release) is taking so long.

Even if you’re not a project manager working on a development project, you may have similar questions when you encounter software problems in your normal daily life. Why does your laptop always need updates? Why is the web site you want to use down yet again? Why are big budget games full of bugs when they are first released?

If you have ever written code for a living you may be simultaneously exasperated and embarrassed by these reactions. Embarrassed because you recognise that too much code which makes it to production contains too many bugs which should have been detected and eliminated in testing. Exasperated because comments such as ‘Why don’t they just get it right first time?’ fundamentally misrepresent the experience of writing software.

(You may also be incredulous: surely there is no one managing a technology project who doesn't understand the basic nature of coding. But those sentences above are real words heard on real projects - and not just from project managers, but sometimes from senior leaders. As usual, though, our reaction should not be to blame people for ignorance, but to share knowledge and experience.)

Software which works first time is a rare and unexpected experience - so rare that, when it happens, you suspect that you’ve missed something. The much more common experience is to fight through a litany of errors which seems never ending. Whenever I write more than a few lines of code, my experience when I try to run it is: watch the code fail due to obvious typos; watch the code fail due to basic syntax errors; watch the code struggle into life, only to collapse under some fatal run time error; get the code running and observe basic functional and data errors; get the code to an apparently working state, only to encounter some odd and unexpected behaviour; stare at the code in frustration; become convinced that the framework or interpreter is misbehaving; scan the Internet for common problems; stare at the code again; debug, set flags, add print statements to report variables; discover an error so obvious that I can’t believe that I never saw it; fix it. Then find the next error.

The reason for this experience is that code is a complex and delicate construction of logic which it is ridiculously easy to get wrong. A misplaced letter or punctuation mark, a misremembered variable name, an indent in the wrong place can result in unintended consequences. If you are lucky then those consequences are an immediate and obvious error; if you are unlucky then they will go unnoticed until they cause more serious problems.

This means that good software development does not mean good typing: it is effectively impossible to write code that always works. Software in its natural state is broken. It also means that good software development recognises this natural state: it includes automated tests to help detect what has broken this time, and it favours simplicity and readability to help the next developer figure out how to fix it (bearing in mind that the next developer may be the current developer suffering the rapid amnesia that seems to follow all code production).?

The point of this attempt to explain the natural fragility of software is not just to help non-technical project managers show greater appreciation for the work of development teams (although it is partly that). Rather, it is to help people who don’t have a technical background appreciate what it means to live in a world of software. It does not mean living in a glittering world of perfectly working bits and bytes assembled by robots: it means living in a world of messy, fragile logic crafted by fallible human beings, and subject to the faults of every human endeavour. And that’s why it’s important to explain how all this stuff works.

The Round Trip Question: Journey Map

This series of articles is driven by a conviction that computing is increasingly important to our lives, but many people don’t understand how they work, and that those of us working? in the industry therefore have a duty to explain. It attempts to answer The Round Trip Question: what happens when you press ‘send’ on the mobile banking app on your phone?

I’m using this section at the bottom to capture the list of questions which arise as I write each article. If I go wrong, or if you have other questions, please tell me in the comments.

To-do:

Who are all these humans who write code? How do they work?

How does my mobile phone know that I am me?

How does my bank’s computer know that I am me?

Why do action heroes ‘break into the computer room to hack the mainframe’? How realistic is that?

What’s a mainframe?

What’s a computer room?

[From Bradley Safer] Who else can see my data? What are they allowed to do with it?

[From Prakash Sethuraman] What is data? Why is it important to protect it?

There will be plenty more questions. For now, though, here’s the very rough picture of what we have covered so far:

No alt text provided for this image

(Views in this article are my own.)

要查看或添加评论,请登录

David Knott的更多文章

社区洞察

其他会员也浏览了