Data Entropy
Mustafa Qizilbash
Data & AI Practitioner | Author | CDMP Certified | Innovator of DAC Architecture & PVP Approach | 50k Followers
Data Entropy, a term must be known by Data Scientists but not by general data folks.
Let’s decode it…..
Entropy term is normally used in Data Science domain.
·??????Where finding unexpected outcomes is the aim
·??????Where surprises are welcomed
·??????Where results are analyzed based on probability ratio, lower the probability the better it is
·??????Where informative information is something which wasn’t known
Types
·??????High Entropy means, more surprises, more unexpected values are considered more informative
·??????Low Entropy means, less surprises, less unexpected values are considered less informative
‘Entropy is also called as Extreme Disorder of values.’
Image: https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
Referring to the image, we can see at the starting point all the signs are MINUS, then in the middle there are 50/50 signs of PLUS & MINUS and right at the end all the signs are PLUS. At those extremes left, middle and right corners, the Entropy is at the lowest, so no surprises are expected so nothing much for Data Scientists to predict right. But in 2nd, 4th, and 5th circles, it's difficult to say how many PLUS(s) and MINUS(s), if values are not visible, at this points Entropy is at the highest.
Yes, it’s a confusing topic but this is how Data Scientists find unknown values and tries to predict unknown values.
Cheers.
Data Architect at Tata Steel BV, designer of data constructs.
1 年I tend to use data entropy in a different way: The loss of validity relative to the real world of now over time. A company has to react in the real world and due to that the company and the real world change, leading to a decline in value of the older data for the prediction of what happens now. This is an aspect of data that is hardly to never included in models and the maintenance of the models.