The Democratisation of Data

The Democratisation of Data

Unless you’re living under a rock, it is nearly impossible to miss signs of the data revolution that is happening all around us.?Whether it’s information about your electricity consumption from the monthly bill in your mailbox, statistics on the TV screen as you watch the Formula One or operational reports that you review at work – the presentation of information in the form of data is alive, well and growing at an ever-growing rate.??

I’ve recently had the good fortune to lead a work project with a humble ambition – to drive the ‘Democratisation of Data’ for a large bureaucratic organisation.?The program was to connect some of the organisation’s on-premises data storage technologies to a contemporary, cloud-based reporting solution, replace some legacy reports in the new technology and do so in a scalable / secure fashion.?The experience has led me to consider all things ‘data’ over the last 10 months and has awoken a question I’ve pondered for some time now - why is it so bloody hard for organisations to harness the power of data?

As I pointed out previously, often the best place begin to answer such a question is go back to the beginning.

The evolution of approaches used to collect, translate, and present data is really a story of evolving business needs. ?Therefore, by understanding how the use/application of data has evolved over the years, one gains insights into how to best position data today to maximise uptake – to become a ‘data-driven’ organisation and to ‘democratise data’ for the masses.?

The Original ‘Physical’ Data Storage Device – Punched Cards

In other pieces I’ve written in the past, including here, I’ve argued that the history of computing is really the history of information processing.?From the earliest commercial computing borne of Mauchly and Eckert’s ENIAC and the UNIVAC computing machines in the late forties through to today’s advances in quantum computing, the history of computing is the history of storage, manipulation, and consumption of data.?

Back in the good ole’ days, data was stored in the form of punched cards.?Punched cards encoded the information (‘information’ or ‘data’ – those terms will be used interchangeably) that was used to facilitate the flow of instructions in the computing machines of the day. Since the flow of instruction and data was via the punched cards, one could reasonably argue that the punched cards were the ‘databases’ of computers back in then.

As computing methods and computing machines rapidly advanced over the subsequent decades, the advances in data input and storage were slowly dragged along.?Without getting too deep, it was the pivot to the ‘von Neumann’ architecture of computing (von Neumann paradigm - the unit(s) that process information within computers is separate from the those that store the information) which drove the requirements for better storage methods.?This need for mass storage was accompanied by the need to quickly read/write the information.?And it was this facet of the UNIVAC that was so revolutionary in early computing – the development of magnetic tape devices in place of punched cards.

With magnetic tape came the ability to scan through the tape to find the desired record (a.k.a. - read data), perform a calculation on that record(s), and return the resultant to the tape (a.k.a. - write data).?And voila!, the democratisation of data had begun.?The only caveat in 1950 was that participate in this revolution of data democratisation, one had to have a Ph.D. in physics as well as access to one of only 15 or so UNIVAC computers built in the 1950s.

From here, the story accelerates quickly through the evolution of data architectures.?We’ll ignore the hardware aspects of the data storage evolutions for now and focus solely on the advances in the organisation of the data on the physical storage devices.

?A (very) Brief History of Modern Data Architectures

At the risk of grossly oversimplifying the evolution of data architectures, there are relatively few pivot points where major shifts have occurred.?For example, the early punched card approach to storage was eventually replaced by drum storage which itself was replaced by solid-state technologies (integrated circuits, or ‘ICs’).?These advances led first to the ‘Flat-file’ and ultimately to relational database management systems (RDMS) along with Structured Query Language (SQL), the programming language used to interact with RDMS systems.

The ‘Flat File’

As described above, the magnetic tape storage devices stored information/data one record after another in a ‘serial’ fashion on the tapes.?It took time to skip through the records to find the one you were looking for (imagine having to listen to 4 songs on spotify to get to the 5th song you actually want to hear?).?This was a key challenge of early methods around read/write of data.?Over time and as memory evolved and became significantly cheaper (e.g. - more abundant), magnetic tape gave way to disk drives which in turn gave rise to the ‘flat file’ for storage.?

Most reading this now are familiar with .csv file (csv – comma separated variable).?This is essentially a flat file with records in sequence, not unlike the magnetic tape, containing stored information.?Large flat files such as these could be considered a ‘database’ back in the day.?Although relational databases have replaced flat files for most enterprise applications, flat files are still in use today for small scale applications. ?In fact, there are still examples (fewer and fewer now) where legacy database methods nearer to a flat file are still used. For example, Services Australia (formerly known as DHS) still uses mainframe technology and a 1960s ‘flat file’-like database architecture called Model 204 in its Income Security Integrated System (ISIS) calculation engine to process payments to this day.

Relational Databases

One could pick any number of dates to anchor as the advent of ‘modern’ when it comes to data architectures but since I’m fond of the early seventies (my first car was a 1971 Nova), I’ll pick Alan Codd’s paper around that same time as the start of the first big evolution in data storage. Codd’s paper laid out a theory of procedures of storage related to the relationship between the data elements – a relational database.?Sounds fancy, yes, but actually it is easy to understand conceptually.

As an example, imagine an individual record comprised of one’s surname, given name, address and favourite colour. ?Now further imagine a collection of these individual records (representing multiple people) organised by alphabetical order of surname.?You now have a data base.?There are, however, some complications associated with the above.?For example – multiple people may have the same favourite colour.?If we have hundreds of thousands of records, do we need to store the “colour blue” tens of thousands of times??Or could we instead store it once an have all records ‘point’ to that single entry – “colour blue”??What about data accuracy – what if we spelled “color” instead of “colour” in some of the records? Finally, what about querying the information to draw insights from the data – what’s the best way to optimise these queries (speed)??The principles in Codd’s seminal paper and subsequent research helped to sort through these challenges via ground-breaking concepts such as database ‘normalisation’.??All these served to optimise RDBMS further.

One last distinction worth noting.?One often hears of a relational database (or simply database) interchangeably with a data warehouse. Are the two the same??My simple answer is yes and no.?Yes, in the sense that a data warehouse has a relational structure to its data.?No, in the sense that a data warehouse is organised differently (by column instead of row as with relational databases) to optimise analytical processing.?Essentially, the data warehouse is a further evolution of the RDBMS.?For the purposes of this (very) brief history, think of them as the same.?

Data Lakes

As storage technology continued to advance from the mid-80s through to the early 00’s, storage densities continued to increase and cost per unit storage decreasing. This and the proliferation of networked computing (a.k.a. – the internet) facilitated a massive growth in content (content = data) in both business and personal contexts.?Put differently, it led to an explosion in both structured and unstructured data (see this for the distinction between the two). ?Sure, historical paradigms such as the relational database continued to scale to handle the structured data (cubes, data marts, etc.) however a new paradigm was needed for the unstructured data and use cases where the amount of structured data was prohibitive for conventional applications.?

Enter the data lake.

Conceptually, a data lake is simply a data repository that does not require up-front knowledge of the relationship between data elements in structured data. Equally, data lakes can store unstructured data (or data in its original format such as pictures or video).?A data lake can be on-premises or can be in the ‘cloud’ with commercial examples of data lake providers including Hadoop, Microsoft Azure and AWS. Also note that an enterprise-grade data lake that one typically finds in large organisations can store both structured (row, column), semi-structured (flat file .csv) as well as unstructured data (emails, video, etc.).?Typically, you won’t find data lakes in small organisations as they often lack the volume of data and/or maturity to deploy and manage the technology.?So, think of a data lake as the most current, industrial-strength paradigm for data storage.

Before moving onto the last concept – how organisations are beginning to use the data for decision making - one key point is in order.?It’s important to note that older paradigms such as the database/data warehouse approach described above are very much alive and well today; as is the use of flat files.?Sure, drum storage has gone the way of the dodo bird, but it often takes a long time for paradigms to be fully supplanted by newer technologies.?A great example is free-to-air television.?That concept was first commercialised in the 1930s yet every TV today still have free-to-air tuners despite the wide-spread adoptions of streaming video such as Netflix, Disney+, etc.?

Data Products - The Latest Paradigm

The above discussion was largely centered around the technology however the evolution of the technology paradigms described above are borne of the changing use or application of the data being retrieved, manipulated, and stored.

A more recent shift in paradigms that prioritises the data itself as opposed to the data storage/retrieval gaining popularity today is the notion of ‘Data Products’.?

Using a strict definition, a data product is defined as “a product that facilitates an end goal through the use of that product” (credit - DJ Patil, former USA Chief Data Scientist).?Again, in plain English, this means treating data (and data sets) more like a product.?This is best illustrated via example.

The iPhone is a product.?When you purchase the iPhone, it comes with everything you need to operate it as a product; a charger, headphones and instruction manual.?Similarly, a data product comes with the data, code to manipulate the data, some metadata (think instruction manual) and all the infrastructure necessary to support the data. Other components necessary for the distribution and consumption of digital products include a digital portal which creates a marketplace where data consumers can source data products relevant to their business function.?Continuing with the iPhone metaphor, one can think of the AppStore as a ‘data portal’ that serves up the applications (applications analogous to data products) and the marketplace is the ecosystem of people who shop the AppStore for products.

The above description represents the ‘tip of the iceberg’ of data products, omitting many key concepts that sit ‘below the waterline’ of the iceberg.?Key concepts excluded include the system and processes that manage the engineering of the data products into re-usable assets, the governance that determines what products should be on offer and who should have access to which data products.?Finally, the alignment of data products to the business domains (marketing, operations, underwriting) is also an underpinning capability necessary to support the data product approach described above.?

Two points are worth additional emphasis.?First and as was stated previously, the data product concept is a deliberate shift from focusing on the storage/technology paradigm to a focus on the data itself and its use in the business. This shift is in line with the growing trend of using the data for decision making beyond simple operational reporting (predictive analytics).?Secondly, this shift requires significant technology and process capabilities to be truly effective for an organisation.?Coming back to the iceberg metaphor, focusing on only the tip (e.g. – data products) and ignoring the ice below the water line (digital portal, definition of business domains, governance) is a recipe for the immediate sinking of the good ship ‘S.S. Data Products’

Putting It All Together

This piece started with an evocative title (The Democratisation of Data) and a na?ve aim of spelling out, in 1000 words or less, what hinders the democratisation in organisations.?The title could equally have been “how long is a piece of string?” because defining why organisations/people (spoiler alert – organisations are just people) are resistant to embracing the use of data is as equally open-ended as the “string” question.

A few thoughts though….

Businesses currently use data in the form of reports all the time.?Democratisation of data is just getting more and more information (data) to more and more people within businesses – not just to the leaders/managers.?To do so, democratisation will require awareness training (there are many who still recoil every time a fancy bar chart is put in front of them), change initiatives and learning development along with improved processes, models and technologies.?

The evolution is underway though.?With new paradigms such as data mesh and powerful reporting tools such as Microsoft’s PowerBI, organisations are slowly pivoting to be more data driven. Microsoft’s PowerBI is particularly intriguing because of its simplicity and wide availability to everyday knowledge workers who use Word, PowerPoint and Excel. As I’ve written before here, if Microsoft plays their cards right, they can leverage their dominant market share to exploit the opportunities.?Noting PowerBI is on the visualisation end of the spectrum, there are clearly other technologies in the back end that offer similar opportunity.

There is no silver bullet though.?Like any organisational change such as becoming ‘digital’ or ‘customer-centric’, transitioning to a data-driven mindset requires support from senior executives and a commitment to the change of people, process and technology.?And it needs to be led by business need – not by Information Technology (IT).

In a recent discussion with an architect, we discussed democratisation of data and data mesh’s role. ?The architect concluded the discussion, in a rather jaded tone, by describing data mesh as “self-licking ice cream”.

Hopefully data mesh falls short of that lofty accomplishment because data democratisation is achievable with the right mindset.?I also really like ice cream; I’m not looking forward to the day when ice cream licks itself.

Jeffery Eberwein is a senior partner at EY in the Consulting practice focused on digital technology and its implications for business. He can be contacted at?[email protected]



2 年

Great informative article Jeff - good on ya! ??

Pete Chapman

Enterprise Architect

2 年

Great post! But maybe there is a silver bullet - a universal data modelling language that decouples information from applications by making it semantically complete. Then information can be truly democratised, wresting control from applications and returning it to the people to whom it rightly belongs. The data mesh emerges organically through peer to peer sharing within communities of interest. All kinds of new solutions become possible! Alex Andrenacci Ralph D. Thompson Jackie Chin


Jeffery Eberwein的更多文章

  • AI vs. the Humble Spreadsheet

    AI vs. the Humble Spreadsheet

    The ‘exclusive’ luncheon offered by Microsoft seemed like a good opportunity to learn more about the future direction…

    4 条评论
  • What would ChatGPT make of the OpenAI Fiasco?

    What would ChatGPT make of the OpenAI Fiasco?

    One wonders what ChatGPT would return in response to a question regarding the situation that transpired over the last…

  • The Collapse

    The Collapse

    During my EMBA, we had a class on Business Law. The dean of the program taught the law class and used a fascinating…

    5 条评论
  • The Andersen Story

    The Andersen Story

    My career didn’t start in the field of 'consulting'. Both my first job out of Uni which lasted about a year and then my…

    3 条评论
  • The Enron Story

    The Enron Story

    As the saga behind Enron bankruptcy started to unfold in the summer and fall of 2001, Enron was a paradox. A ‘New…

  • Enron and Arthur Andersen - 20 Years Later

    Enron and Arthur Andersen - 20 Years Later

    Exactly twenty years ago this weekend, I was in New Orleans for a milestone Global Partners meeting with Arthur…

    11 条评论
  • How Projects Fail

    How Projects Fail

    “Happy families are all alike; every unhappy family is unhappy in its own way." Many would recognise the famous first…

    7 条评论
  • WTF are NFTs?

    WTF are NFTs?

    Over the recent past, Non-Fungible Tokens (NFTs) are popping up everywhere. The first time I heard about NFTs was in…

    4 条评论
  • The 'Defect' Resolution

    The 'Defect' Resolution

    Earlier this week, I was speaking with one of my partners who is in the final stage of delivery for a large system at a…

    3 条评论
  • "Hey Google - What's this Dispute About"

    "Hey Google - What's this Dispute About"

    Unless you've been completely distracted with the politics of the US this last week, you may have seen something in the…

    11 条评论

