Data Modeling for Mere Mortals – Part 1: What is Data Modeling?!
Nikola Ilic
I make music from the data??Data Mozart ??| MVP Data Platform | Pluralsight Author | O'Reilly Instructor | MCT
In recent years, I’ve done dozens of training on various data platform topics, for all kinds of audiences. When teaching various data platform concepts and techniques, I find one of the concepts particularly intimidating for many business analysts, especially those who are just starting their journey. And, that is?the concept of data modeling.
Why is data modeling sometimes intimidating?
Maybe because you feel lost when you see all these diagrams that look so complex and complicated…
But, data modeling is not about diagrams. It’s about creating trust, a shared understanding between the business and data professionals, with the final goal of providing increased business value with data.
If we agree that data modeling is about creating trust, I believe we can also agree that trust can’t be easily built – a certain amount of time and effort should be taken into consideration. And, time and effort are not something that you take for granted – it’s something that you need to INVEST!
So, we can rightly assume that?data modeling is a kind of investment. An investment that should bring more stability and adaptability to your business.
Think of it like investing in building a house.
Obviously, you can choose to go the quick and easy way, by simply putting building blocks directly on the ground – and it can possibly work just fine for some time – until some new circumstances occur – think about an earthquake or thunderstorm for example. And, your house will probably be damaged. But, it’s not only the house that’s going to be damaged –?your trust will also be damaged?– people who live with you, your neighbors, friends…Will realize that you didn’t invest the proper amount of time and effort in advance, to prevent such a bad scenario.
Now, let’s assume that you decided to take another path – more demanding at the very beginning, which will require more time and effort from your side. You established a proper foundation for the house, secured things under the ground, and then built a house on top of it. Now, your house will be more stable and can adapt to future challenges.
Since we explained why it is of paramount importance to invest time and effort in building a data model, let’s now examine various types of data models and how they fit into the big data modeling picture.
Conceptual Data Model
Usually, the starting point is creating a conceptual data model. This is a high-level, let’s say, 10.000 feet high perspective on the business needs for data. As we are talking about the high-level perspective, the main goal of the conceptual data model is to simplify business processes and entities important in day-to-day business workflow.
In this stage, we are compiling a big picture: what are the key entities in our business workflow? How do they correlate with each other? The key characteristic of the conceptual data model is that it should communicate in easy-to-understand terms. Simply said, leverage a common language that business users and non-technical individuals can easily understand.
I know I told you that data modeling is not about diagrams, but still, we need to visualize the process of creating a data model. I’ll first give you a basic example of the conceptual data model.
In this illustration, you can identify various entities. Stadium, Event, Customer, Attendee, and Ticket. You may also notice how these entities are interconnected. This high-level overview provides a simplified picture of the business workflow within the organization.
Now, let’s move on and explain in common language what we see in this illustration.
Our first entity is?Stadium. Stadium has a name and is located in a specific country and city, which uniquely identifies that stadium. Stadium may host many events, and there can be many attendees coming to these events.
Next, we have an?Event. A specific event cannot exist outside of the Stadium where it is scheduled to be held. An event can be attended by an attendee, and there can be many attendees for one event.
Attendee?is the entity that attends the event. They can also be a customer of the Stadium entity, in case they’ve visited a stadium shop, or similar. The key difference between the Customer and Attendee is that the Customer doesn’t necessarily need to attend a specific event at the stadium.
Customer?may have a relation to Stadium, like I said, for example by visiting a stadium museum or buying at a stadium fan shop, but that doesn’t make them attendees of the event.
Finally, a?Ticket?is an entity that represents confirmation that the attendee will attend a specific event. Each ticket has a unique identifier, as it would be really awkward if two or more attendees get the ticket with the same number. Although the ticket is uniquely identified, one attendee can purchase multiple tickets.
Why do we need a conceptual data model?
Now that we’ve explained the core components of conceptual data modeling, you might be wondering: Why is this important? Why should someone spend time and effort describing all the entities and relations between them?
Remember when we were talking about building trust between business and data personas? That’s what the conceptual data model is all about. Ensuring that business stakeholders will get what they need, explained in a common language so that they can easily understand the entire workflow. Setting up a conceptual data model also provides business stakeholders with the possibility to identify a whole range of business questions that need to be answered before building a physical data model.
Some of the questions business may ask are: are the customer and attendee the same entity (and why they are not)? Can one attendee buy multiple tickets? What uniquely identifies a specific event? And, many more, of course…
Additionally, the conceptual data model depicts sometimes very complex business processes in an easier-to-consume way. Instead of going through pages and pages of written documentation, one can take a look at the illustration of entities and relationships, all explained in a user-friendly way, and quickly understand the core elements of the business process.
领英推荐
Logical Data Model
Once business and data teams align on the conceptual data model, the next step in the data modeling process is designing a logical data model. In this stage, we are building upon the previous step, by identifying the exact structure of the entities and providing more details about the relationships between these entities. In this stage, you should identify all the attributes of interest for the specific entity, as well as relationship cardinality.
Please pay attention that, same as during the conceptual data modeling phase, we still don’t talk about the specific platform or solution. Like in the previous stage, our focus is on understanding business requirements and how these requirements can be efficiently translated into data model.
There are several steps to be performed to ensure that the conceptual data model successfully evolved into a logical data model.
Why do we need a logical data model?
Unlike the conceptual data model, where the benefits of investing time and effort in building it were not so obvious, I believe that for the logical data model, potential gains are more evident. First of all, the logical data model serves as the best quality assurance test, because it can enclose gaps and issues in understanding the business workflow, thus saving you a lot of time and effort down the road. It’s much easier and less costly, to fix these issues at this stage, before locking into a specific platform and building an inefficient physical data model on it.
As we’ve already mentioned, one of the key characteristics of a good logical data model is that iteration and fine-tuning are continuous processes. Therefore, building a logical data model can be considered part of the agile data modeling cycle, which ensures more robust, scalable, and future-proof models.
The ultimate benefit of the logical data model is that it serves as a blueprint for the final implementation of the business logic through the physical data model. Relying on a well-designed logical data model enables database engineers and data architects to create more efficient physical database systems.
Physical Data Model
A physical data model represents that final touch – how the data model will be implemented in the specific database. Unlike conceptual data model and logical data model, which are platform and solution-agnostic, physical implementation requires defining low-level detail that may be specific for the certain database provider.
Transitioning from a logical data model to a physical data model requires more iterations and fine-tuning of the entities and relationships defined in the logical data model.
The same as for logical data model, there is a whole list of necessary steps to make your data model implementation success, so let’s focus on the most important ones:
Why do we need a physical data model?
With all points mentioned previously, the main benefit of having a physical data model in place is to ensure efficiency, optimal performance, and scalability. When we talk about efficiency, we are obviously having in mind the two most precious business assets – time and money. Unless you think that time = money, then we have only one asset to consider…
To simplify – the more efficient your data model is, the more users it can serve, the more faster can serve them, which in the end in most cases brings more money to the business.
Side-by-side comparison
Here is a brief overview of the key characteristics of each of the data model types:
Conclusion
I want you to remember three key things about data modeling:
In the next part of the series, we’ll examine the wonderful world of dimensional data modeling…
Thanks for reading!
MSCA Data Analyst ● BI Analyst ● Power BI Developer ● MSCA T-SQL ● AZ-900
1 年Excellent Nikola Ilic !!! Unfortunately, most of the time, stakeholders want to see the "cake" ready quickly, not worrying about its content.
Putting your data on the cover of your business @ Fellowmind
1 年Great work, Nikola! All business analysts or business intelligence developers should be taught data modeling as a prerequisite skill before learning any technical skill. That would make better (and cheaper) solutions with higher business value. It is the conceptual foundation for any data driven insight solutions you might end up implementing.
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
1 年Thanks for Sharing.