In NOSQL, data modeling doesn't have to be logical!
Rik Van Bruggen
Helping the world Operationalise Machine Learning and AI in a meaningful, efficient, managed and effective way
At Hackolade , I have been really enjoying my journey into the world of data modeling. It really has been a journey: in some ways, I have felt like I had to go back in time a little bit, and re-learn some of the skills that I had known in the old days when relational databases still dominated this blue planet. ER diagrams, documentation requirements, naming conventions... all good things that had seemed completely normal in the old days - and that got a little bit of a bad rep when the hipster NOSQL databases and data formats started to gain popularity. I mean: at Neo4j , we used to say that "your data is your model", and downplay the need for deep modeling thoughts ahead of time. And at the same time, we also realised that every time a project wasn't going great, it was the freakin' data model that was the culprit. Every time.
So great: NOSQL data modeling is a thing, again, and we now have a great set of modern tools like the Hackolade Studio to facilitate it. But there I have noticed that there is a lingering question. It has to do with how data modeling facilitates the conversation between business and IT, and how that means that you have to have multiple levels of modeling.
It's good to be leveled
In traditional data modeling, there's always been different levels of modeling, at different levels of abstraction. Specifically, we have been working with
Note that this "conceptual" level of data modeling has signficant
amounts of overlap with the ideas outlined in Domain Driven Design.
We will explore and explain that link in a future article in more detail.
Note that this "logical" level of data modeling is the one that has been
the source of quite a bit of confusion. Hence the title of this article
- you will see below that we are going to suggest a departure from these
three levels of data modeling... and from their names!
We summarized the characteristics of these different levels in this table:
Now, while it is clearly a good thing to have different levels of data modeling (as it greatly facilitates our ability to have a conversation between business people and technologists - one of the core functions of data modeling in the first place), it is not always easy to understand how we can apply this to a world of agile development and heterogenous data backends. Why, well
领英推荐
At the end of the day, this is all about striking a balance, and finding the right levels of abstraction to achieve the results that we want. When you do that, and think about what is the most efficient and effective way to do that, you may want to make some changes to the conceptual/logical/physical levels above.
Simplify and solidify: 2 levels for the future
Because of the issues that we outlined above, it is appropriate to revisit the nature of the different modeling levels in our new, agile, NOSQL context. We want to be able to satisfy the different concerns of data modelers, and at the same time provide a more coherent framework that would
This required a change in terminology and tooling. Hackolade introduced its ideas around Polyglot Data Modeling for this very reason - creating a new level of technology-agnostic data modeling that sits across the traditional boundaries between conceptual/logical and logical/physical data modeling. We suggest that you work with two levels to achieve your desired results:
This new, simpler and solidified structure for our data modeling efforts, will achieve the same objectives of the conceptual/logical/physical strata, but do so in a more modern way. This may seem illogical at first - but it is actually common sense and extremely simple and handy when you get your head around it.
Hope this was a useful article for you - and if you have any comments, please reach out and let's discuss!
Cheers
Rik
CEO @ RevoData | Databricks Champion & Trainer
1 年Daan Tuijnman a solid read!