Confusing words in Data Modeling
Rik Van Bruggen
Helping the world Operationalise Machine Learning and AI in a meaningful, efficient, managed and effective way
For a few weeks now, I have been helping out my friends at Hackolade with some really interesting work around their core product, the Hackolade Studio and how to make it even more successful in the marketplace. This means talking to a LOT of people - both current customers, active users, partners, friends in the industry - and more. It’s been fantastic to talk to so many interesting people, and to learn so much from their insights.?
During these conversations, I have noticed that there’s quite a bit of work to be done to clarify and straighten out the meaning of the words that we use in these conversations. I have noticed that in the NOSQL data modeling space, we are not always very precise with our words - and that this imprecision can lead to all kinds of misunderstandings. Specifically, I have been struck by the confusion around 3 words: model, schema, and metadata.
This is not a confusion that is specific to my conversations of the past few weeks. A quick google search will show you that there have been literally millions of articles and conversations online asking the same question - and this has been going on for a very long time.
Again: I don’t think that this is a new conversation. However, it does seem to me that the confusion is kind of new - it does seem to have become more acute. In the old days of Relational data modeling, people seemed to have a much clearer view on what was meant with modeling terms. Specifically, people always talked about
Note that in these three modeling terms, we never ever mentioned the word “schema”. That’s completely normal and intentional: models are very different from schema. Models are visual, documentation artefacts that are intended for humans, and that are specifically designed to facilitate conversations about data structure between stakeholders. Schema, on the contrary, are outputs of the physical data modeling exercise – and represent a machine readable contract with regards to the structure of the data as it will be guarded and enforced by the database management system.
Now here’s the issue: in the world of NOSQL, where polyglot persistence architectures are the rule and the norm and where many different data backends will be hosting many different types of data in many different data models, we seem to have forgotten some of the above terminology and we are getting confused by sloppy interpretations about these terms.
What about model / schema / metadata?
So what is the issue? Well first of all, I think that people are using these terms interchangeably way too easily. I have spoken to very smart people in our industry that sometimes say one thing but mean something else, and then in the next sentence get very specific around the precise meaning that they are trying to articulate. Let’s make up our mind, shall we!
领英推荐
?
In general, I do think these three terms articulate very different concepts in the data modeling profession, and that as such, we should take deliberate care when using them. I know I am really just a novice at this, but from my point of view
Positioned in a simple Venn diagram, I would articulate the three different words and their relation to one another like this:
Meaning: the database METADATA contains the database MODEL and the database SCHEMA, both of which are touching eachother as they are very much connected. In that touchpoint, database architecture would be very important, as it would?ensure that the rift between model and schema does not become too wide.?
That brings me to the end of this article. In writing it, I hope I have clarified some topics that have clearly been unclear in the NOSQL Data Modeling space, and that struck me during my first couple of ventures learning the important topics that the industry is struggling with. I am sure that this is not the end of the journey – I plan to explore these topics more in future article.
As always: let me know if you have any thoughts about this article – I would love to discuss.
All the best
Rik
Ingeniero de datos | AWS User Group Perú - Arequipa | AWS x3
1 年I love your post; congrats ??! i like how you separate database paradigm; like document, graph or tabular approach; from database model artifact; as well a clarification how database model; a business and social metadata; and database schema; a technical metadata; are connect to give overview insights about data.
Business Transformation | Technology Enablement | Private Equity
2 年Love it. Very informative!
Community Builder @HOWEST ?? Life-long learner?? 46 years of L&D experience in higher education ?? Generalist in IT, (Gen)AI, cybersecurity, Web3,, ... ?? Trendwatcher ?? Tech Knowledge&News-aholic ?? Born to Learn
2 年Thanks for your insights, Rik! Cc Steve Hoberman, DMC
Helping you earn and keep Digital Trust at KIWA Vin?otte | Cybersecurity Industrial Automation and Control Systems (IACS) ICS OT IEC/ISA 62443 IT IOT NIS2 ISO27001 CYFUN ISO42001 | Private Pilot | Security Cleared ?
2 年Thanks for sharing Rik! Frederik Coomans