Stop normalising Data-items
https://www.youtube.com/watch?v=yYioLVWgh64

Stop normalising Data-items

(Created 6 Oct 2022 - Updated 7 Oct 2022)

If you still do not believe that normalisation is flawed try this exercise to develop a database design, using all 7 normal forms, from the following data-items:

1. person_name

2. parents_name

3. date_of_birth

4. place_of_birth

5. parents_address

6. parents_phone-no

The futility of Normalisation

For those of you who think that you will be able to use the 7 normal forms of normalising data_items let me show you why this is futile.

Let us examine the first of the 6 data-items and try to explain it in its unnormalised form. This should prove that normalising data-items is a waste of time and should be avoided at all costs.

person_name:

This is already in first normal form as it is not a repeating group. Think again! A person_name can be regarded as a repeating group with ((surname and given_name)). So you would have to apply Codd’s first normal form and end up with 3 entities, each with their own candidate key which could then be considered to be the primary key and underlined to represent this concept:

Person (person_id)

Surname (person_surname_id, surname, person_id)

GivenName (given_name_id, given_name, person_id)

At first this may be frivolous but think about a business operative needs who wants to be able to list every occurrence of a person with the surname ‘Smith’ and then work out if the people are related as being a parent or a sibling.

However let us apply Codd’s second normal form, namely “It is in first normal form. It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation.”

In the Person entity person_id is not a non-prime-attribute therefore Person is in second normal form. The same logic can be applied for Surname and GivenName.

Let us apply Codd’s third normal form, namely “This is about removing the transitive dependencies by identifying the determinant of each transitive dependency, which then becomes a primary key of a new table and then assigning the appropriate attributes. An appropriate name of the new table needs to be chosen as well”

But first what does transitive dependency mean? One definition is “A transitive dependency occurs when one non-prime attribute is dependent on another non-prime attribute:

In the Person, Surname and GivenName entities there are no transitive dependencies. Therefore these entities are in third normal form.

How about Bryce-Codd’s normal form (BCNF or 3.5). This states “a relational schema is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. A relational schema R is in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least one of the following conditions hold:

  • XY is a trivial functional dependency (Y ? X),
  • X is a superkey for schema R"

where X is the superkey and Y the data-item

The relational schema between these 3 entities looks like the following:

No alt text provided for this image

This demonstrates that a given_name and surname can belong to many Person(s). This shows that a superkey may exist and will need to be resolved. But how do you resolve this? We may need to wait until we have done a bit more normalising. And this is the first sign of futility!

What about fourth, fifth and sixth normal forms?

Fourth normal form is built on the 4 previous forms and states: “there are no non-trivial multivalued dependencies other than a candidate key.

Firstly what does a non-trivial multivalued dependency mean? One explanation is “the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency”. where X is the candidate key and Y, a data-item, that is not a subset of X”. Now what is a subset? One explanation is “a part of a larger group of related things”. So are the 3 entities in fourth normal form? According to Fagin (who developed this form) it is indeterminate as we can almost state categorically that the 3 are in third normal form but may not be in BCNF so we may have to wait until later. This is the second sign of futility!

Fifth normal form, also known as projection–join normal form, states it is “a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships”.

As there seems to be multiple relationships between Person and Surname and GivenName these tables are not in fifth normal form so more work needs to be done to resolve this issue. This is the third sign of futility!

Sixth normal form states it is “based on an extension of the relational algebra.

Relational operators, such as join, are generalized to support a natural treatment of interval data, such as sequences of dates or moments in time, for instance in temporal databases.. Sixth normal form is then based on this generalized join, as follows:

A relvar (relation variable) R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table] concerned”

or

“Relvar R is in sixth normal form (6NF) if and only if every JD [Join Dependency] of R is trivial — where a JD is trivial if and only if one of its components is equal to the pertinent heading in its entirety”.


Now what is ‘interval data’? One explanation is “measured along a numerical scale that has equal distances between adjacent values. These distances are called “intervals.” There is no true zero on an interval scale, which is what distinguishes it from a ratio scale”.

What is a temporal database? One explanation is that it is one that “stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time”.

So are the 3 tables we have just produced in third and BCNF in 6th normal form. Your guess is as good as mine. This is the fourth sign of futility.

What about parents_name?

If you think you are going to be able to normalise this data-item you are in for a rude awakening. But why not give it a try. You will ultimately find the same level of futility as trying to normalise person_name.

Allow me to go through this exercise:

Is parents_name in first NF? My answer is no! The semantics of the name proves that it is plural and therefor a repeating group. So how do you get it into first NF?

Parent (parent_id, parent name)

But as I have shown in the person_name example parent_name is a possible repeating group (and hence unnormalised) so you would need to normalise this further as

Parent (parent_id)

Surname (surname_id, surname, parent_id)

GivenName (given_name_id, given_name, parent_id)

Now is not a Parent also a Person?

You will now probably need to apply 5NF or 6NF to resolve this issue.

What about date_of_birth?

It is obviously in first NF as it is not a repeating data-item.

How about second NF? This is where you have to make a decision:

1. Create a Date Of Birth table

Date Of Birth (date_of_birth_id, person_id)

or

2. Add date_of_birth to the Person Table

Person (person_id, date_of_birth)

Case 1 seems to be a solution but date_of_birth could now become a candidate key as it will be repeated multiple times. While this is not a bad idea it could prove to be pointless unless some business operative is interested in listing all people born on the same day. But place_of_birth is not dependent on the primary key date_of_birth & so it is not in second NF so more work is needed,

Case 2 seems to be fully normalised but date_of_birth may first appear to be functionally dependent on person_id but is could also be a candidate key which is no longer functionally dependent on person_id.

Now what happens when you add additional data_items like place_of_birth or time_of_birth?

You will now have to deal with:

1. Date Of Birth (date_of_birth_id, date_of_birth, time_of_birth, place_of_birth, person_id)

or – but place_of_birth is not functionally dependent on date_of_birth_id so this entity needs to be normalised further

2. Add date_of_birth to the Person Table

Person (person_id, date_of_birth, time_of_birth, place_of_birth) - but place_of_birth is not functionally dependent on person_id so this entity needs to be normalised further

Once again I have proven that normalisation is futile.

As for the rest well the same applies, so I will not bother to bore you with the mind numbing exercise other normalisation experts will put you through.

In my Ripose in Pictures (to be completed but a sample is now available) I show how, using my approach (and especially the use of my Knowledge model), I am able to produce a 'fully normalised' database design without the use of normalisation.

In conclusion

I have continually railed against normalisation and for good reason. It will only ever produce 'paralysis by analysis' leading to compromises, leading to data mining errors leading to poor decision making on the part of senior managers. I challenge anyone to prove me wrong.

Regards

Charles Meyer Richter

Ripose Pty Limited

Information Architect Grade Zero

Charles Meyer Richter

Principal information architect & diagnostician at Ripose Pty Limited

2 年

9 Oct 2022 - Update to my 'Ripose Picture Book' - https://lnkd.in/gUdFzsH4 After reading a comment from a colleague stating that he felt I was using normalisation to produce my Logical Data Model I have now made the necessary update on page 74 to counter his 'assertion'. In addition I have added my logic behind why you should stop using brainstorming Objectives, Strategies & the Conceptual Data model (pages 62 - 65). I am thinking of writing another article to describe my opposition to Brainstorming. Regards ps If you wish to stop getting notified of my rather controversial views you can always disconnect from or stop following me. I am here to enlighten you as per the statements on the LinkedIn about page - https://about.linkedin.com/ and especially their Mission Statement

回复
Charles Meyer Richter

Principal information architect & diagnostician at Ripose Pty Limited

2 年

I want to thank everyone who has contributed their thoughts on this topic. The assertion that I am making is that using the 7 normal forms (NFs) will NOT produce an efficient nor effective Logical Data Model. I am asking you to prove that you CAN use the 7 NFs to produce an efficient & effective Logical Data Model & prove my assertion wrong. In logic and mathematics, proof by contradiction is a form of proof that establishes the truth or the validity of a proposition, by showing that assuming the proposition to be false leads to a contradiction. Proof by contradiction is also known as indirect proof, proof by assuming the opposite'. Please prove me wrong with logic & not rhetoric! Regards

回复
Karel van der Walt

Director at MentalArrow (Pty) Ltd

2 年
回复
Marco Wobben

CaseTalk - Business Information Modeling | Engineer | Trainer | Author | Artist | Human

2 年

The PDF is very informative on Ripose, more than anything I've seen so far. I'm not sure your stated person example are usable to show normalization is futile. We all know names are not identifying for people when the group of people is enlarged. This why organizations provide customers with customer-numbers, governments provide social security numbers, etc. Instead of staying close to an interpreted reality and apply our simple thoughts on them, we should look at the administrative needs of the business (employees) instead. But even then, normalization is optional.

要查看或添加评论,请登录

Charles Meyer Richter的更多文章

  • My fascination with Values

    My fascination with Values

    Preamble This transcript was the result of my reading a LinkedIn post by Christine Claire Graeff . As an Information…

  • My fascination with Benefits

    My fascination with Benefits

    7 Mar 2025 Preamble This transcript was the result of my reading a LinkedIn post by Mark Jaski. As an Information…

  • My fascination with the Architecture Development Method (ADM)

    My fascination with the Architecture Development Method (ADM)

    7 Mar 2025 Preamble This transcript was the result of my reading a LinkedIn post by a member of the Vintage LinkedIn…

  • My fascination with the Product-Centric Model Theory

    My fascination with the Product-Centric Model Theory

    Preamble This transcript was the result of my reading a LinkedIn post by Chris Chamberlain . As an Information…

  • Enterprise Architect vs Information Architect

    Enterprise Architect vs Information Architect

    Preamble There have been many a discussion as to the difference between an Enterprise Architect and an Information…

  • My Fascination with AI Agents

    My Fascination with AI Agents

    2 Mar 2025 Preamble This transcript was the result of my reading a LinkedIn post by Brij kishore Pandey on the subject…

  • My Fascination with Agile and Technical Debt

    My Fascination with Agile and Technical Debt

    Preamble This transcript was the result of my reading a LinkedIn post by Jesper Lowgren on the subject of Technical…

    2 条评论
  • A history of the Causes of IT failures

    A history of the Causes of IT failures

    28 Feb 2025 Preamble This transcript was the result of my reading a LinkedIn post by Dennis Mulder. As an Information…

  • My fascination with Critical Thinking

    My fascination with Critical Thinking

    26 Feb 2025 Preamble This transcript was the result of my reading a LinkedIn post by Allan M Lee, PE. As an Information…

    1 条评论
  • My fascination with Knowledge Graphs and the Resource Description Framework

    My fascination with Knowledge Graphs and the Resource Description Framework

    25 Feb 2025 Preamble This article was the result of my reading a LinkedIn post by Adrian Gschwend. As an Information…

社区洞察

其他会员也浏览了