Thinking differently about . . . data
One of the first tools I used in my programming career was something that we would barely recognise as a tool today: it wasn’t a testing tool, or a deployment tool, or even a compiler. It was a physical rubber stamp used to create a variant of something called a Bachman diagram: a representation of an ICL IDMSX database structure. You used the stamp to create boxes that represented record types, and then filled in name and other characteristics in ink.
I share this story not just out of nostalgia (although I am sure that this will bring back memories for many people), but to illustrate just how differently we must think about data today than thirty years ago - and to remind us that, for many of us in senior technology leadership positions, we acquired skills, beliefs and habits in a world very different to that of today.
At the beginning of my career, I learnt a very data oriented approach to systems design and software engineering, resulting in a deeply ingrained set of beliefs:
You should start with the data: if you are going to describe, design or build a system, you should start by understanding the data that the system is going to handle.
You should get the data structure right: the Bachman diagram tool was just one way of imposing a strict schema on data.
Designing data is a specialist job: the DBA was a powerful figure who had authority over the database and a set of skills the rest of us could only aspire to.
Storage is precious: when designing programmes and data structures, we should minimise the use of disc (or tape) space, even at the cost of losing detail or meaning.
Managing data infrastructure is hard: data infrastructure contains moving parts, uses media that degrades, is temperamental . . . and is always running out.
These beliefs served me and many others well for years. But we don’t live in a world dominated by mainframes and tapes any more (even though they are still around). The degree to which that world was described by electronic records was tiny: there was still more data on paper than on disc. The world of today is increasingly completely described in electronic form - and that form is large, messy and impossible for humans to manage manually.
This change means that the beliefs that served me early in my career have mostly been replaced by new beliefs:
You should start with the data: this one survived! I still find that the best way to understand a problem is to understand the data associated with it. However, understanding that data today may involve analysis of billions of data points, rather than deriving a record structure from a paper form.
You should focus on meaning over structure: data structure is still important for precise interactions, such as API calls, but most of the data in the world today is unstructured. Figuring out the meaning of data is typically more useful than fitting it to a precise schema.
Meaning can be determined by machines: we might think that meaning is one of the last provinces of human reasoning - and that is true for more elevated and profound categories of meaning. But if we want to determine the basic content of billions or trillions of unstructured expressions of data, such as web pages or documents, then we have to get help from machines.
Storage is less valuable than data: the cost of storage continues to drop, and the value of data continues to rise. We made mistakes when we favoured the cost of storage over the value of data.
Managing data infrastructure is hard: this one survived too! Much storage infrastructure still has moving parts and still fails. And storage infrastructure that has to handle increasingly large data volumes is increasingly complex. The good news, of course, is that you don’t have to do it yourself: Cloud platforms will do it for you. So, this belief should be: managing data infrastructure is hard - but only if you choose to do it.
These new beliefs may seem incredibly obvious. If we don’t realise that we live in a world of increasing data volumes, value and processing power, then we haven't been paying attention. But it is important from time to time to step back, to remember where we acquired our earliest beliefs and habits, and to ask whether those beliefs and habits still show up in our behaviour. Recognising and challenging their vestiges is an important step in thinking differently.
(Views in this article are my own.)
Director @ Deutsche Bank | Cloud-Native Architecture, Analytics and Data science
4 年Very insightful and most practical view on data. The comparison aspect is very interesting.
Helping organisations translate great ideas into business change in a digital world.
4 年Interesting thoughts David Knott. You had me concerned for a few paragraphs....I thought "start with data" was going to die a death in the article. Super relieved to see it alive and kicking. Machines inferring meaning, absolutely and why not. They can process vast quantities of information in real time at pace. They can get to an understanding and surface consumable insights a lot quicker. Having machines make the decisions based on this meaning....lets take this one a little slower. Self driving cars have come along way for example but they still get the wrong end of the stick. The future is bright but with every answer comes a thousand questions....I love questions!
Semiconductor Test Engineering, Technical Project Management , Business Intelligence through Data Analytics
4 年Interesting comparison on the aspect of data handling then and now.?
Business Programme Manager, Enterprise Contract Lifecycle Management (eCLM) IBM F&O Q2C Transformation
4 年Hi David, thank you this was a really intresting article - took me back to when I joined and our rather large PC's ran a bright green NOSS system on a Tiger screen. Good Memories but boy I am glad we have progressed so far.