登录查看更多内容

How to distinguish facts, attributes and relations from bullshit ?

Sylvain Arbaudie

// performance and problem-solving minded data architect // jack of all MariaDB trades (and master of a few) // my posts reflect my personal views // praise the Omnissiah //

发布日期: 2024年7月15日

We've seen in my previous article ( who said rant ? ) how to easily normalize database items, but how do we know what's what ? It is crucial to clearly define and differentiate between facts, attributes, and relationships to create an efficient and evolutive design . Here’s how you can identify each component and filter out unnecessary information.

Know the domain

It should make perfect sense that nobody ever created a decent data design on a domain it didn't know beforehand. To gather that knowledge or learn the missing bits : collect detailed requirements from stakeholders to understand what data needs to be stored and how it will be used. Without this intimate knowledge, it is impossible to filter out unnecessary information ( aka bullshit ).

Now let's begin by defining the topics :

What is a fact ? A fact is a fundamental element or entity that represents a real-world object or concept that you need to store information about. For exemaple customer, product, contract, home are facts. Facts are stored in tables with unique ids.
What is an attribute ? An attribute is an intrinsic, usually immutable property or characteristic that provides more detail about a fact. Attributes describe the various facets of said afact. For exemplae, name, first name, colour, size are attributes. Attributes are stored in columns, each having its own.
What is a relation ? A relation defines how two or more facts interact with each other within the considered system. For exemplae : possessing, living in, purchasing, using are relations. Relations are stored in table containing the unique ids of the linked facts.
Rest is probably useless bullshit ... until it becomes usefull later on, but that's gonna be another discussion ( yeah this means a whole article about that ).

Ain't it easy(peasy) ? Not that much indeed. Let's now have a look to a method that helps identifying the aforementionned 4 categories ( facts, attributes, relations and bullshit ).

Use natural language

Natural language descriptions are invaluable in distinguishing attributes from facts and relationships : a fact is usually a noun that refers to a core concept of the domain we are modeling. Attributes do provide specific details about the previous item (the fact), and natural language helps identify these details through descriptive terms and context. Relations are very often expressed by verbs linking facts between themselves.

Example "Our e-commerce platform needs to manage customers, their orders, and the products they purchase. Customers have personal contact details such as name, first name and brithdate, and unique public ID. Each order includes an order date and can contain multiple products, each with a name and price."

Facts : customer, order, product - materialized by different tables having unique ids
Attributes : Customer: unique public ID, name, first name, birthplace - materialized by one column per attribute in the corresponding table
Order: order date - materialized by one colmun per attribute in the corresponding tableProduct: name, price - materialized by one colmun per attribute in the corresponding table
Relationships : A customer places orders - materliazed by a table linking the unique ids of each of the fact tables it does link.An order contains products - materliazed by a table linking the unique ids of each of the fact tables it does link.

Lukasz Olejniczak, PhD 2 个月前

How Human JSON helps in converting more customers for…

Pranavjeet Mishra 3 个月前

Building a Semantic Matching Engine for Short…

Sarvaha Systems 2 个月前

Caveats

There are three main mistakes that can lead to wrong design. While it usually happens to attributes, some of those could alos lead to wrongly consider an item as a fact.

Ambiguity Attributes may sometimes be ambiguously defined or interpreted, leading to confusion in their representation in the database schema.
Overgeneralization It's crucial to avoid overgeneralizing attributes, as this can lead to capturing unnecessary or redundant information that doesn't add value to the database.
Context Sensitivity Attributes often depend on the context in which they are used. The same term might have different meanings or requirements in different parts of the system.

The two most infamous caveats are phone number and address. The true wording about phone is : "My phone operator granted me a line publicly identified by this phone number". The number has an existence outisde the person ( and also the device used to answer the calls ) : those are different facts linked toghether by relations. Simiarly the true wording about address is : " i live in/possess a home/building/plot of land that sits at this address." Address is a geographical beacon that exists wether there is something or not, whomever possesses the place sitting there. Address is a fact that is linked to a person by a relation. In both cases linking directly the address fact to the person is a simplification but a necessary one most of the time.

False friends

Superficial similarity : Some terms may appear to be straightforward attributes but turn out to require deeper analysis due to their multifaceted nature or context-dependent usage.
Misleading names : Certain terms might misleadingly suggest they are attributes when they are actually entities or relationships, requiring careful scrutiny during database schema design.
Dynamic data : Some attributes might seem static but actually require dynamic calculation or updating, which affects their storage and retrieval in the database. Since they are not immutable they shoud never be considered as attributes.

The most infamous false friend is age. Despite being seemingly intrinsic age is not immutable. It is also not that intrinsic since all the people born the same day ahev the very sdame age. What should trigger your understanding is that age is a calculation that need knowing another attribute : birthdate and the day of calculation date. This is a dynamic attribute so it should not be stored nor even considered at this stage of data design.

Conclusion

Using natural language along with this structured approach ensures that you correctly identify the core entities (facts) in your database design, providing a solid foundation for further detailing (attributes &relations) and implementation. Beware to truly addressing the caveats and being aware of false friends. By strictly following those guidelines professionals in charge of data design can effectively identify and define attributes that accurately represent the data requirements of the system, fostering a robust and reliable database schema.

Sylvain Arbaudie

// performance and problem-solving minded data architect // jack of all MariaDB trades (and master of a few) // my posts reflect my personal views // praise the Omnissiah //

4 个月

any comments /.thoughts ? Federico Razzoli Patrizio Tamorri Richard Bensley Richard M. Sebastien GIRAUD Anders Karlsson Kathryn Sizemore Kaj Arn? Michael Amadi MariaDB Foundation MariaDB Michael "Monty" Widenius Alejandro Duarte Daniel Black Oli Sennhauser ?

要查看或添加评论，请登录

Sylvain Arbaudie的更多文章

Let's go physical(ly separated)

2024年10月7日

Let's go physical(ly separated)

In my previous article, we discussed best practices for securing and monitoring admin access, such as role-based access…

5 条评论
Galera : often overlooked, still powerful

2024年10月1日

Galera : often overlooked, still powerful

Galera Cluster, while often overlooked, is a sophisticated and robust database solution whose full potential is…

5 条评论
Cave adminem

2024年9月24日

Cave adminem

Securing access to a MariaDB database is essential to protect sensitive data and maintain system integrity. Basic…

11 条评论
Choosing the Right Data Types for Database Attributes

2024年7月29日

Choosing the Right Data Types for Database Attributes

Selecting appropriate data types for database attributes is crucial for performance, storage efficiency, and data…

4 条评论
Let the fun begins !!!

2024年7月22日

Let the fun begins !!!

No we have defined our facts, their attributes and the relations that unites all those facts. So we're good aren't we ?…

1 条评论
Three simple rules to not fail (too much) the foundations of your data design

2024年7月10日

Three simple rules to not fail (too much) the foundations of your data design

Designing a robust and efficient relational database is crucial for the smooth operation and management of data in any…

2 条评论
DBAs not dead ( punk ) !

2024年7月3日

DBAs not dead ( punk ) !

Today's post has been inspired ( fueled ? ) by this linkedin post about what devops should worry about. And the analogy…

12 条评论
MariaDB: so much more than just a MySQL fork

2024年6月21日

MariaDB: so much more than just a MySQL fork

In the course of conversations, I realise that MariaDB is still often perceived as a simple fork of MySQL maintained by…
MariaDB : bien plus qu'un simple fork de MySQL

2024年6月21日

MariaDB : bien plus qu'un simple fork de MySQL

Au détour de conversations, je m'aper?ois que MariaDB est encore souvent per?u comme un simple fork de MySQL maintenu…

5 条评论

See all articles

How to distinguish facts, attributes and relations from bullshit ?

Sylvain Arbaudie

// performance and problem-solving minded data architect // jack of all MariaDB trades (and master of a few) // my posts reflect my personal views // praise the Omnissiah //

Know the domain

Use natural language

领英推荐

Caveats

False friends

Conclusion

Sylvain Arbaudie的更多文章

社区洞察

其他会员也浏览了

Of Superheroes, Hypergraphs and the Intricacy of Roles

Approximate Search

Swift 5.7: Regex. Shorthands for optional unwrapping. Unlock existentials for all protocols.

The Entity Resolution Trap: Where Dunning-Kruger Lurks

Text to SQL and the Translation Problem in Data Disciplines

When Short Queries Meet Long Documents:

?? Leveraging Prompt Chaining and Function Calling in Text-to-SQL Conversion with ??

Build an Advanced RAG App: Query Routing

Data Scraping Made Simple: Uncover Hidden Insights with the Top Tools

Semantic Web

Know the domain

Use natural language

领英推荐

Caveats

False friends

Conclusion

Sylvain Arbaudie的更多文章

Let's go physical(ly separated)

Galera : often overlooked, still powerful

Cave adminem

Choosing the Right Data Types for Database Attributes

Let the fun begins !!!

Three simple rules to not fail (too much) the foundations of your data design

DBAs not dead ( punk ) !

MariaDB: so much more than just a MySQL fork

MariaDB : bien plus qu'un simple fork de MySQL

社区洞察

其他会员也浏览了

Of Superheroes, Hypergraphs and the Intricacy of Roles

Approximate Search

Swift 5.7: Regex. Shorthands for optional unwrapping. Unlock existentials for all protocols.

The Entity Resolution Trap: Where Dunning-Kruger Lurks

Text to SQL and the Translation Problem in Data Disciplines

When Short Queries Meet Long Documents:

?? Leveraging Prompt Chaining and Function Calling in Text-to-SQL Conversion with ??

Build an Advanced RAG App: Query Routing

Data Scraping Made Simple: Uncover Hidden Insights with the Top Tools

Semantic Web