What is Data Entropy?
Image courtesy of https://pxhere.com/en/photo/742228

What is Data Entropy?

There is a common meme that LinkedIn regulars will know well. It shows a series of pictures of Lego, one with lots of bricks all mixed up, another with the bricks separated out by colour, and perhaps more with the bricks assembled into shapes.

No alt text provided for this image

Captions under each image will read something like 'Data, Information, Knowledge, Wisdom', or 'Data, Sorted, Arranged, Explained'.

As parent of small kids, I can attest that taking a big tub of random Lego pieces and sorting them into groups of similar colour or type takes a lot of effort. I can also tell you that trying to build a large Lego model without doing this sorting first is going to take many times longer.

And yet, no matter how many times Lego gets organised to make it easier for building with, within a week or two it is all mixed up again. It's like Lego has its own kind of Entropy.

Entropy

Entropy in nature is the tendency of things to become disordered over time. Objects that are finely sculpted lose definition, things that are separate become mixed, things that are extreme become moderate.

Natural forces and processes, from the thermodynamic behaviour of gas molecules to weathering and erosion, cause ordered matter to become disordered over time.

https://pxhere.com/en/photo/659632

Hot objects donate heat to cooler things around them until all are the same temperature. Your latte doesn't go 'cold', and your frappe doesn't get 'warm', they both just align with room temperature. Scientific theory tells us that everything in the whole universe will eventually arrive at the same temperature.

https://pxhere.com/en/photo/158383

Tectonic forces push the earth upwards to create mountains. However, once the 'organising' tectonic forces subside, wind and water will eat away at that mountain, like a sandcastle on the beach, slowly turning rock to dust and eventually leaving the earth as flat as it was before the mountain first rose.

We can apply energy in a directed way to slow, or even reverse entropy in controlled circumstances, e.g. by constructing erosion barriers on shorelines, or by microwaving that latte. However, without regular housekeeping 'order' inevitably becomes 'disorder', 'sorted' becomes 'random', 'different' becomes 'same'. Entropy will ultimately have its way.

Data Entropy

Data in an organisation has it's own kind of entropy too.?We might start out with highly planned and structured enterprise data systems, but gradually, little by little, disorder can creep in. In the cut and thrust of the working world, solutions often need to be tactical, rather than strategic.

  • Strategic actions are like oil tankers: slow to get going and hard to turn around, but capable of moving enormous loads over vast distances.
  • Tactics are like jet-skis: short-range, fast moving, and well suited to carrying a small load a short distance, albeit often leaving a messy wake.

The attraction of tactical action over strategic action to solve problems quickly is undeniable, with the longer term consequences often ignored. Examples of tactical behaviours that can lead to data disorder include:?

  • making copies of core reports or databases ('data silos'), in order to make changes or customisations without the management overhead and delay that changing the master copy would require;
  • re-creating common business logic in local data silos and reports rather than the central data model, resulting in multiple separate implementations of things should be standard;
  • creating on-going data solutions using data elements that need to be created and/or updated manually in every iteration, creating an on-going management overhead and dramatically increasing the likelihood of error;
  • failing to assess data quality, and/or to address data quality issues as they arise, meaning that over time DQ issues accumulate until people lose faith in the data;
  • failing to create and/or maintain documentation describing data systems and explaining key design features, meaning that new users have a very steep learning curve, and your systems risk becoming 'black boxes' that people are afraid to amend or enhance;
  • failing to have and/or apply consistent standards for naming, development, testing and delivery of data solutions, meaning users cannot easily understand all systems, systems cannot talk to one another easily, and ;
  • employing SaaS solutions that do not allow full access to the data created there, meaning important organisational data cannot be interrogated independently of the SaaS application, or integrated with other enterprise data.

Some of these behaviours happen covertly within the business, away from the gaze of IT or the data management team. This is usually inadvertent, but sometime deliberate. The business must be facilitated and enabled by IT and Data Management to do their work, and a sure sign that things are going wrong is when the business starts implementing solutions by themselves to avoid dealing with these functions.

Others behaviours may be facilitated by IT and Data Management teams, under varying levels of pressure from the business to address high-priority requirements so quickly that they decide or agree to cut-corners.

To mitigate the effects of 'data entropy', active management and governance is essential:

  • Organisations should establish and communicate clear codes of conduct and best practices to ensure staff can identify, and are aware of the impacts of, poor data management.
  • The business should be given access to appropriate tools, training and information to make effective use of data.
  • IT and Data Management must be given sufficient support and funding that lack of resources does not become a bottleneck, limiting business use of data and forcing business users to invent their own ways of doing things.
  • When urgent changes require 'cutting corners', the business must commit to implementing the changes properly immediately at the first opportunity after the emergency changes are applied.

Conclusion

For Lego, the natural force that tends towards disorder is children. All a parent can do is encourage their little ones to look after their toys and periodically help them tidy up and do some sorting. Hopefully we can prevent tears by ensuring precious pieces are not lost under the couch or in the belly of the vacuum cleaner.

Similarly, without on-going effort to manage data and enforce good practices, data entropy will gradually erode organisational data capability and capacity. However, if IT and business work together with common cause to 'apply energy' in the right way, data entropy can be minimised and reversed.

Questions on Data Warehousing, Data Integration, Data Quality, Business Intelligence, Data Management or Data Governance??Click Here?to begin a conversation.

John Thompson is a Director with EY's Technology Consulting practice. His primary focus for many years has been the effective design, management and optimal utilisation of large analytic data systems.

Aleta Chowfin

Marge'ah Limited

1 年

This is great information to have when workshopping with educators about Computational Thinking and tactical data.

回复
Pedro Alves Batista

SRE / DevOps @ The Home Depot | Big Data & HPC | Industry 4.0 | Edge Computing | FinOps | Systems Engineering | Sustainability | IoT

2 年

Such nice reading !

回复
Umesh Tiwari

Associate Vice President | Data Analytics | Ex Deloitte US

3 年

Great piece!

回复
Nicholas Lowe

Manager Technology Consulting

3 年

Who doesn't love a good Lego analogy? John has a talent for explaining tough topics in easy language. Thanks John.

Enda Rochford

Systems and Training

3 年

My favorite definition is "things tend towards chaos"

要查看或添加评论,请登录

John Thompson的更多文章

  • Enterprise Data - its just plumbing, right?

    Enterprise Data - its just plumbing, right?

    When I started as a data consultant many years ago, my first solo assignment was to resolve a number of issues a small…

    7 条评论
  • After Big Data

    After Big Data

    When Distributed File Systems came on the scene in the late noughties, everyone realised that something big was…

    4 条评论
  • The Big Power of Small Data

    The Big Power of Small Data

    We have all been so bombarded in recent years with information about 'Big Data' that the value of 'Small Data' is…

    1 条评论
  • When do you not need a Data Warehouse?

    When do you not need a Data Warehouse?

    ‘Data Warehouse’ (DWH) is the term used for the last 30 years by both technicians and business stakeholders to mean…

    2 条评论
  • Becoming Data Centric

    Becoming Data Centric

    I’ve spent the last two decades working with analysts to solve data problems in a systematic way and to create…

  • Schrems II: What Does it mean for EU Data Processors?

    Schrems II: What Does it mean for EU Data Processors?

    The Schrems 2 case has been long running and much discussed and its ultimate findings, while still being digested, will…

  • How is Data Management Different from IT Management?

    How is Data Management Different from IT Management?

    In a season where the Liverpool football team is about to win the Premier League for the first time in 30 years, a…

  • Rise of the (Data Science) Robots

    Rise of the (Data Science) Robots

    I started out at university studying Molecular Genetics and for a long time considered doing a doctorate and building a…

    5 条评论
  • Choosing a BI Tool

    Choosing a BI Tool

    Data reporting and visualisation ‘BI’ tools come in many flavours, with a bewildering variety of features to confuse…

    7 条评论
  • Why Do We Need Analytic Data Platforms?

    Why Do We Need Analytic Data Platforms?

    When talking to customers I often encounter the same questions repeatedly. One of the most common is "Why do we need a…

    3 条评论

社区洞察

其他会员也浏览了