From Chaos to Order: Introducing Data Entropy
Photo by Darius Bashar

From Chaos to Order: Introducing Data Entropy

Despite the ethereal nature of our digital worlds, the laws of physics still apply. We have matter-dependent things happening all around us, always. 

My work on information systems and data ecosystems has inspired me to establish the theory of data entropy. Data entropy is an analogy of the nature of data and its ability to decay over time without proper care and management. The second law of thermodynamics states “as one goes forward in time, the net entropy (degree of disorder) of any isolated or closed system will always increase (or at least stay the same)[1]

I am coining the term data entropy to describe the phenomenon of thermodynamic forces applied to bodies of data in its lifecycle (data at rest, data in motion, data at work). Entropy, in this context, is manifested in the way people, processes and technology work to maintain order and slow the rate of decay or disorder. Like the notion of Information Entropy, where physicists seek to create hyper-efficient and lossless computer networks, data entropy focuses on what is required to maintain and enhance the insight value of data.

Entropic effects move things from a state of order to disorder. Thanks to Georgia State University for the information. https://hyperphysics.phy-astr.gsu.edu/hbase/Therm/entrop.html

Our information systems have an advantage over the physical world, we can slow and even reverse the effects of entropy (thanks Farnam Street for the inspiration). The ability to slow the rate of entropy within a data ecosystem, helps maintain the asset value of data and keep the ecology in balance. How do we address these phenomena in a practical manner?

Let’s ask and answer the following questions:

  • What are some examples of data entropy? Where are things degrading? 

Data quality is a classic example of entropic effects. When poor data is created without proper stewardship and governance, the quality of that input degrades, and energy needs to be spent to restore or delete the record. Extract, transform and load (ETL) is another area that can produce significant amounts of entropy. If the human and machine processes to connect to, move and store data in an organized manner is not highly tuned and managed, friction is created and processes slow. Interesting to think of data gravity in this context as well. In his blog Dave McCrory describes how data becomes immovable when possessing massive volume and force. In this context, as data becomes harder to move, the more entropy in the system; interestingly an inverse correlation to the idea of entropic gravity where, the less gravity that exists, the more entropy in the system. If we can keep our systems balanced, we can control the force of data gravity and stave off entropy.       

Data security is another major potential entropic force on data. If data is vulnerable, it inherently possesses more entropic force. Hacks and data breaches move the ecosystem from stability to complete chaos rapidly! 

  • What are the design requirements for controlling data entropy? How to I take control and slow the rate of decay in my ecosystem?

No easy question to answer. I have defined eight (8) major capability areas within the data ecosystem that firms must understand, assess and invest in to decrease entropy within their environments. See the image below:

No alt text provided for this image

Frameworks and business processes are important to apply in this context. Understanding where the most entropy exists in your ecosystem will help you focus on where to invest and what problems to tackle first. As you move forward on your digital transformation journey notice where entropy is wreaking havoc and target resources to restore balance.           



Eliot Arnold

Data wonk, entrepreneur and executive experienced in developing novel solutions to complex problems. TechStars ‘21, StartupHealth Transformer, Researcher, Pickleball Fanatic!

5 年

Dr Tony Burns the primary tenant in this article refers to the presumption that if volume, velocity, variety and veracity of data increases unchecked, this creates more entropy in the environment and ultimately less utility. Poor accessibility is a symptom of higher data entropy in my opinion.

回复
Dr Tony Burns

Q-Skills3D Interactive learning in Continual Improvement for all employees

5 年

Are you talking about data base design for the degree of data accessibility? My data base design friend Charles Richter might want to comment? https://www.dhirubhai.net/in/charles-meyer-richter-1734a19/

要查看或添加评论,请登录

Eliot Arnold的更多文章

社区洞察

其他会员也浏览了