Database Normalization II
About a month ago, I wrote the first part of the Database Normalization series, which focused on explaining what normalization meant, and the problems it solves in databases. In this second part, I’ll be explaining how you can achieve proper normalization within your database.
Just to reiterate, Database normalization aims to minimize data duplication, avoid errors during data modifications, and simplify data queries from the database. There are three fundamental ways of normalizing a database. They are;
These three different forms of normalization are the recommended processes to follow when working in your database. To explain how to implement each of them, we will consider the table below.
This data present in this table shows it is in an unnormalized form. We can see that there are repeating groups of data in a couple of cases, for example, the DoctorID, DoctorName , Region, Council columns have duplicated data. There are also instances of multiple data stored in the same cell, as in the case of PatientName, SlotID, and TotalCost columns. In this state, the table offers little to no value as updating, and querying of data will be very difficult, hence the need for normalization.
When faced with tables like this, the three normalization forms are applied one after the other. So we’ll start with the first one:
First Normal Form
First Normal Form (1NF) is the first concept in the process of database normalization. In the context of 1NF, there are three principles to adhere to:
- Atomic Principle: This principle states that each attribute (or column) in a table must contain atomic values. In order words, your table should have one value per field. For example, in our table above, the “Name” column should not contain comma-separated values, but instead a single name per row. It will be easier to manipulate and query the table if each piece of data is represented by a single value.
- Repeat Principle: This principle states that each column should contain only one value, and each row should be unique. There should be no repeating groups of attributes within a table.
Knowing these principles of 1NF, let us go ahead and make our table conform to the first normal form.
We can see that the atomicity problem only exists in the columns of data related to patients, so to fix this, we have to create a new table for patients. This way we organize data that relates to the patient entity in a separate table that has each column cell containing only one single instance of data. See below:
With this separation, we now we have a patients table that uses composite primary keys (PatientID and SlotID) as unique identifiers for each record in it. It resolves the atomicity problem that initially existed. The rest of the table containing the doctors details will look like this:
In this Doctors table, we still have to remove the repeating groups of data in each column. To fix this, we’ll go ahead and separate the table into two: one for doctors, the other for surgery, with each table dealing with one entity.
Normalizing our table has led us to breaking it into 3 different tables: Patient, Doctor, Surgery. By ensuring it meets the atomicity & repetition principle, we can say our database now meets the first normal form.
Second Normal Form
Normalization in databases is a progressive process, this means each form builds upon the one before it. Second Normal Form can therefore only be implemented after the table has undergone First Normal form. 2NF has just one principle, which is the Partial Dependency principle.
This principle states that a table should not have any partial dependency, which occurs when a non-key attribute (columns that are not the primary key, and not part of the composite key) depends on only a part of the primary key. Basically, partial dependency exists when a column depends on only one columns that make up the composite key.
In our database, the patient table is the only table with a composite primary key. This is how the table currently looks.
This table uses a composite primary which is made up of the PatientID & SlotID columns to uniquely identify each record. The existence of this is what is causing the partial dependency in the table. The PatientName column is a non-key column, and it is only related to the PatientID column. Similarly, the TotalCost only relates to the SlotID. This is essentially what partial dependency means.
To fix this, we need to split the table into two; Patients table and Appointment Table
For the other half of the table, SlotID is not unique, so we will introduce a unique column that will act as the primary key for each record in the table, we then call the table Appointments Table.
With these done, we have eliminated partial dependencies from all tables, and our database now conforms to the first and second normal forms.
Third Normal Form
This form of normalization deals with eliminating transitive dependency. Transitive Dependency occurs when non-key attributes are dependent on another non-key attribute. To normalize a table in the third normal form, the table must have undergone the first and second normalization, after which you look out for the columns that are dependent on non-primary key columns.
In our example database, we have established the first and second normal forms, but if we take a look at the surgery table below, we will notice a case of transitive dependency, as two columns - SurgeryCouncil & Postcode - are dependent on each other rather than the primary key, SurgeryNumber.
Basically, changing the council value requires that you also change the postcode, as each postcode belongs to a council. This kind of dependency needs to be eliminated for the table to be in third normal form. To do this, we need to split the table to cater for the surgery and the region separately.
By creating these two tables, all the tables in our database have now become fully normalized. ?We now have a database that is well-structured, organized, and easy-to-query.
The image below is a high-level diagram of how we broke the table from its initial state to a fully normalized database.