How to distinguish facts, attributes and relations from bullshit ?

How to distinguish facts, attributes and relations from bullshit ?

We've seen in my previous article ( who said rant ? ) how to easily normalize database items, but how do we know what's what ? It is crucial to clearly define and differentiate between facts, attributes, and relationships to create an efficient and evolutive design . Here’s how you can identify each component and filter out unnecessary information.

Know the domain

It should make perfect sense that nobody ever created a decent data design on a domain it didn't know beforehand. To gather that knowledge or learn the missing bits : collect detailed requirements from stakeholders to understand what data needs to be stored and how it will be used. Without this intimate knowledge, it is impossible to filter out unnecessary information ( aka bullshit ).

Now let's begin by defining the topics :

  • What is a fact ? A fact is a fundamental element or entity that represents a real-world object or concept that you need to store information about. For exemaple customer, product, contract, home are facts. Facts are stored in tables with unique ids.
  • What is an attribute ? An attribute is an intrinsic, usually immutable property or characteristic that provides more detail about a fact. Attributes describe the various facets of said afact. For exemplae, name, first name, colour, size are attributes. Attributes are stored in columns, each having its own.
  • What is a relation ? A relation defines how two or more facts interact with each other within the considered system. For exemplae : possessing, living in, purchasing, using are relations. Relations are stored in table containing the unique ids of the linked facts.
  • Rest is probably useless bullshit ... until it becomes usefull later on, but that's gonna be another discussion ( yeah this means a whole article about that ).

Ain't it easy(peasy) ? Not that much indeed. Let's now have a look to a method that helps identifying the aforementionned 4 categories ( facts, attributes, relations and bullshit ).

Use natural language

Natural language descriptions are invaluable in distinguishing attributes from facts and relationships : a fact is usually a noun that refers to a core concept of the domain we are modeling. Attributes do provide specific details about the previous item (the fact), and natural language helps identify these details through descriptive terms and context. Relations are very often expressed by verbs linking facts between themselves.

Example "Our e-commerce platform needs to manage customers, their orders, and the products they purchase. Customers have personal contact details such as name, first name and brithdate, and unique public ID. Each order includes an order date and can contain multiple products, each with a name and price."

  • Facts : customer, order, product - materialized by different tables having unique ids
  • Attributes : Customer: unique public ID, name, first name, birthplace - materialized by one column per attribute in the corresponding table
  • Order: order date - materialized by one colmun per attribute in the corresponding tableProduct: name, price - materialized by one colmun per attribute in the corresponding table
  • Relationships : A customer places orders - materliazed by a table linking the unique ids of each of the fact tables it does link.An order contains products - materliazed by a table linking the unique ids of each of the fact tables it does link.

Caveats

There are three main mistakes that can lead to wrong design. While it usually happens to attributes, some of those could alos lead to wrongly consider an item as a fact.

  • Ambiguity Attributes may sometimes be ambiguously defined or interpreted, leading to confusion in their representation in the database schema.
  • Overgeneralization It's crucial to avoid overgeneralizing attributes, as this can lead to capturing unnecessary or redundant information that doesn't add value to the database.
  • Context Sensitivity Attributes often depend on the context in which they are used. The same term might have different meanings or requirements in different parts of the system.

The two most infamous caveats are phone number and address. The true wording about phone is : "My phone operator granted me a line publicly identified by this phone number". The number has an existence outisde the person ( and also the device used to answer the calls ) : those are different facts linked toghether by relations. Simiarly the true wording about address is : " i live in/possess a home/building/plot of land that sits at this address." Address is a geographical beacon that exists wether there is something or not, whomever possesses the place sitting there. Address is a fact that is linked to a person by a relation. In both cases linking directly the address fact to the person is a simplification but a necessary one most of the time.

False friends

  • Superficial similarity : Some terms may appear to be straightforward attributes but turn out to require deeper analysis due to their multifaceted nature or context-dependent usage.
  • Misleading names : Certain terms might misleadingly suggest they are attributes when they are actually entities or relationships, requiring careful scrutiny during database schema design.
  • Dynamic data : Some attributes might seem static but actually require dynamic calculation or updating, which affects their storage and retrieval in the database. Since they are not immutable they shoud never be considered as attributes.

The most infamous false friend is age. Despite being seemingly intrinsic age is not immutable. It is also not that intrinsic since all the people born the same day ahev the very sdame age. What should trigger your understanding is that age is a calculation that need knowing another attribute : birthdate and the day of calculation date. This is a dynamic attribute so it should not be stored nor even considered at this stage of data design.

Conclusion

Using natural language along with this structured approach ensures that you correctly identify the core entities (facts) in your database design, providing a solid foundation for further detailing (attributes &relations) and implementation. Beware to truly addressing the caveats and being aware of false friends. By strictly following those guidelines professionals in charge of data design can effectively identify and define attributes that accurately represent the data requirements of the system, fostering a robust and reliable database schema.

Sylvain Arbaudie

// performance and problem-solving minded data architect // jack of all MariaDB trades (and master of a few) // my posts reflect my personal views // praise the Omnissiah //

4 个月
回复

要查看或添加评论,请登录

Sylvain Arbaudie的更多文章

  • Let's go physical(ly separated)

    Let's go physical(ly separated)

    In my previous article, we discussed best practices for securing and monitoring admin access, such as role-based access…

    5 条评论
  • Galera : often overlooked, still powerful

    Galera : often overlooked, still powerful

    Galera Cluster, while often overlooked, is a sophisticated and robust database solution whose full potential is…

    5 条评论
  • Cave adminem

    Cave adminem

    Securing access to a MariaDB database is essential to protect sensitive data and maintain system integrity. Basic…

    11 条评论
  • Choosing the Right Data Types for Database Attributes

    Choosing the Right Data Types for Database Attributes

    Selecting appropriate data types for database attributes is crucial for performance, storage efficiency, and data…

    4 条评论
  • Let the fun begins !!!

    Let the fun begins !!!

    No we have defined our facts, their attributes and the relations that unites all those facts. So we're good aren't we ?…

    1 条评论
  • Three simple rules to not fail (too much) the foundations of your data design

    Three simple rules to not fail (too much) the foundations of your data design

    Designing a robust and efficient relational database is crucial for the smooth operation and management of data in any…

    2 条评论
  • DBAs not dead ( punk ) !

    DBAs not dead ( punk ) !

    Today's post has been inspired ( fueled ? ) by this linkedin post about what devops should worry about. And the analogy…

    12 条评论
  • MariaDB: so much more than just a MySQL fork

    MariaDB: so much more than just a MySQL fork

    In the course of conversations, I realise that MariaDB is still often perceived as a simple fork of MySQL maintained by…

  • MariaDB : bien plus qu'un simple fork de MySQL

    MariaDB : bien plus qu'un simple fork de MySQL

    Au détour de conversations, je m'aper?ois que MariaDB est encore souvent per?u comme un simple fork de MySQL maintenu…

    5 条评论

社区洞察

其他会员也浏览了