Role of data in healthcare - Information gain & data entropy
Areg Kocharian
Business Operation l Digital Transformation l Data integration I Business Intelligence l Data Oriented Marketing Ecosystem Implementation
Chapter 6 – Role of data in healthcare
Decision Trees in Data Modelling?
At the end of the day, the main question or concern of application of data in healthcare, would be around the practicality, as well as the actual impact it may result for the community, resulting in an improved diagnostic, disease management and so on. Since first and foremost, enormous efforts are behind the collection, storage, preparation and communication of such databases, the benefits of the generated insights should outweigh the recourses allocated to providing such valid data bases.
In this chapter the information gain in decision trees, designed for data models will be discussed, followed by a video, elaborating the calculations as an example.
Hence, before continuing the subject of the knowledge graphs and the potential applications in healthcare, I intend to have a pause, and to refer that how the database in healthcare (regardless of management or storage type, relational, OLAP, Knowledge Graph, etc.) may be insightful in a crucial topic, such as “predicting the events of MI or cardiac arrest for instance, for a given population”?
One of the stimulating stages of data analytics (as mentioned previously) is predictive analytics and data models.?
These models in fact utilize machine learning to figure the probability of certain events to occur in the future, given certain variables and connections between them.
In fact, When the topic of predictive data models is brought up, the minds of most of the people starts imagining complex programming principles and knowledge, and yes indeed, it is complex.
But the functionality of a data model mainly revolves around the initial steps where the data types are being grouped based on specific mutual attributes, especially in supervised learning algorithm development.
领英推荐
Information Gain
Each raw data set has certain entropy (Entropy is an information theory indicator which measures the impurity or uncertainty in a group of observations) from initial point and based on the entropy formula and concept, the lower the entropy, the better the data set.
Thus, when the data set is being classified and a decision tree is being developed, it is expected that the entropy to be reduced, comparing with initial data or previous step of classification. This trend is called information gain.
Prior to comprehend the and calculate the information gain of a certain clustering or decision tree, it is vital to understand the concept of Entropy in data set.?
Example (Informative Video):