DATA MODELLING SIMPLIFIED

DATA MODELLING SIMPLIFIED

On a lazy Sunday evening, Samantha is sitting by the window in her favourite table in the café Magnolia with a cup of coffee, red velvet cake and the book 'Telling Your Data Story' by Scott Taylor. The rain has destructed her. She is staring outside the window and reflecting back on her day. During weekend, Samantha works in a grocery store. There is a big storage behind the grocery store where all food articles, flowers, magazines etc are stored. Today she started her morning by checking the stock, filling up the freezer and other shelves of the store with the items from the storage. She loves cataloguing the items in such a way that customers can find the items conveniently.

At the end of her shift, while going back home she bought some grocery for home. Today she bought few vegetables, a bottle of milk, cheese, a bouquet of flower, one magazine and some breads. She has a recyclable grocery bag that she always carry with her to bring her groceries home. It was not an exception today as well.

At home Samantha offloaded the grocery. She put the milk, cheese and vegetables in the freeze. It's not that warm weather today. So she kept the breads on the kitchen top. She put water to the flowers. And kept the magazine in her bedside table.

same article in different contexts

In the afternoon she attended the 'Summer school for Data Leaders'. After consuming all the nuggets of wisdom that Caroline Carruthers and Peter Jackson shares with the aspiring data leaders, now she has a little time to spend with herself.

During weeks, Samantha works in an insurance company. In this job, as data management expert, she catalogues all the data of the organization. In her other job she organizes and catalogues grocery items in the store. She is thinking, if the articles in the store can be compared to data in the organization, there are so many similarities in the nature of work that she does in the grocery store and in the insurance company.

She is recollecting her thoughts about the day when she was handling Dr. Emily Carter’s case. Dr. Emily Carter, age 65, is one of the customer of the insurance company. One of the medical report indicates possibilities of high blood pressure. The expiry date is possibly end of the current month.

Looking at this data, Samantha decided to calculate the risk of heart attack of this customer. Depending on the risk she will suggest appropriate health insurance renewal policies to Dr. Emily Carter. There is a risk calculator to calculate survival risk. Samantha needs to feed the calculator with right information. It calculates the risk differently for men and women. Besides age, the model also expects to provide systolic and diastolic blood pressures explicitly. With this data requirement it was clear that Samantha have to procure more data. The expiry date is also a bit confusing. What is actually getting expired on this expiry date? Samantha called Dr. Emily Carter.

Dr. Emily Carter checked her medical files at home to confirm that 120 is systolic blood pressure. The diastolic pressure on the same date was 80. It was advices to check blood pressure every month. On this expiry date the previous blood pressure data becomes obsolete. The current health insurance is expiring on 12-12-2027. Last but not least, Dr. Emily Carter is female. Samantha also asked few extra information, such as insurance number and the issue date for the recording keeping purpose.

Now Samantha has a grocery bag full of new data elements. Of course she wanted to persist this information. Will adding few more columns in the same customer table suffice?

There is a better approach. Create data model to organize the data in proper context. In this case, there are 3 contexts of the data, Customer's personal information, Customer's health information and Customer's Insurance information.

?Data modelling is like creating a treasure map of with these data. It helps you draw a picture of how all this information is connected and how it should be stored in the system. Therefore understanding data and searching becomes easier. In simple words, data modelling make the treasure hunt convenient. In this context the data model to store the data in the insurance company may look as follows.

In data modelling, each of these boxes are call entity. The header of the box is the entity name. Other fields in the box are called attributes. An attribute describes the entity. The connectors between the entity is called Relationship. The Relationship describes how one entity is related to another. The symbols at the end of the Relationship connector is the cardinality. It describes the data rule.?

The Customer entity in the middle represents actual customer. Customer Id attribute in this entity uniquely identifies a Customer within the system. It is called Primary Key (PK). The other attributes of the Customer entity, such as First Name, Age, Gender are the information that describes the Customer.

Customer Blood Pressure Report entity represents the blood pressure reports of the Customers of the insurance company. Report Id attribute in this entity uniquely identifies a blood pressure report. Whereas the Customer Id attribute in this entity ties one Customer with its Blood Pressure Report. This is called Foreign Key (FK).

The symbols at the end of the Relationship describes how many of one entity can connects with how many of another entity. In actual model it needs to be read in both side explicitly to understand the relationship between the entities.

In the above diagram, the symbol at the right end of Relationship is showing One or Many. This means, one Customer many have one or many Customer Blood Pressure Reports. Dr. Emily Carter was advised to check her blood pressure monthly. If insurance company intend to store this information on a monthly basis, the symbol justifies that.

In the above diagram, the symbol at the left end of Relationship is showing one. This means, One Blood Pressure Report can belong to only one Customer.

It's worth noting, the grocery bag approach can be a choice of integration data model. One grocery bag is one Entity. While procuring data from external system or from another internal system, depending on nature of data those are being procuring, a flat data structure can be created. The Customer Information entity can be useful at the time of collecting data from the customer.

However this structure will not serve all purpose. If employee of insurance company needs to collect and store blood pressure information of their customer on a monthly basis, this structure will create a lot of redundancy. For example, one complete record needs to have customer name, gender values which is not going to change every month. While storing monthly blood pressure information, in every row this values needs to be repeated just for the sake of completeness of the record. This is redundant.

This flat (denormalized) structure may not be adequate during integration depending on the data that is being exchanged. Imagine, Samantha purchased 12 eggs along with all other items that she bought. Of course she needs put the eggs into an egg carrier at first, then put the egg carrier into the grocery bag. Otherwise the eggs may break before reaching home. The same logic goes here as well. In data modelling nomenclature, storing items in sub-folders is called normalizing.

Furthermore, while modelling, one must keep in mind, information related to all customers in the insurance organization are stored in company database. But Dr. Emily Carter stores only her records at home. Depending on difference in purposes, there can be different types of data model created. Basic data modelling principles, data cataloguing best practices are applicable to all situation though. This can be compared to how Samantha organizes grocery items at the grocery store and? how she does at home.

One may wonder, which role in the organization is responsible to create such data models? We can talk about it some other time. NOW, the good news is that as per data? Dr. Emily Carter has a perfect health condition. Her health insurance is also not expiring in recent times. Let’s end this story with this good note.



P.S. The conversation begins where the story ends. If you have reached till this point, you have my open invitation to Coffee and Cake sessions in café Magnolia (in person or virtually). Looking forward to meet you.


Hanna Mironchyk

Data Management Lead | Product Manager | Data Platform at Rabobank | DAMA CDMP Master |

7 个月

Thank you this was an interesting read! Nice metaphores!

Nuria Munzel

Architect with a passion for Data & Information (Technology can be a powerful enabler but good Data Management & Governance is the foundation)

7 个月

"The conversation begins where the story ends." ??

回复
Bjarte Tolleshaug

Senior Consultant & Discipline Lead | CDMP | Data Governance Specialist | DAMA Norway |

7 个月

Nicely explained ??

Tertia Wiedenhof

Data Analytics | Data-driven decision-making | Innovation

7 个月

Really enjoyed this Chandrani, very inspiring!

Bert Dingemans

Trainer en coach Sparx Enterprise Architect, Data, Architectuur en Modelleren

7 个月

Hello Chanfrani, thanks for your practical metaphore for data modeling. Good you used yhe data requirement side of data modeling.

要查看或添加评论,请登录

Chandrani Mukherjee的更多文章

  • TELLING YOUR DATA STORY

    TELLING YOUR DATA STORY

    Chris is proud owner of the most popular grocery store in the town. He has spent his whole life building it.

  • Data Driven Decision Making

    Data Driven Decision Making

    It was the Monday morning once again. Daddy Richard was ready to serve breakfast to his school going darling daughter…

  • Data (Management) Maturity

    Data (Management) Maturity

    Sometimes back I heard the question, is data management improving data maturity? The immediate response came to my mind…

  • Data Silo

    Data Silo

    Introduction Linda is one of the experience data professionals in a 100 year old multinational company, The ABC Inc…

    1 条评论
  • Data Definition

    Data Definition

    It’s been a while I am procrastinating to pen down my thoughts on the topic Data Definition mainly due to simplicity of…

    4 条评论
  • Data Architecture

    Data Architecture

    Introduction I perceive, Business, Data and IT as three sisters growing up in single a household. Business being eldest…

  • Master Data Simplified

    Master Data Simplified

    Business Story Cindi is owner of a company UC Tram Inc. One of the trams of her company runs between cities Utrecht…

    15 条评论