Database Normalization I
As a Data Analyst, Data Engineer, Database Developer or if your role revolves around data and databases, then I believe you must have heard about Database Normalization. It is quite a popular term in the data space, but what exactly does it mean and when do you normalize a database? Let me attempt to answer these question for you. A database functions like a bookshelf holding various types of books. Each of these books are supposed to be connected and related to one another, but there is a problem; some of these books have been duplicated in other books. You now have a book that contains 2 other books in it - crazy, right?
?
Normalization is a process where tables are structured in a way that reduces data duplication and avoids implications from data modifications. The table below will be considered to help provide more information about this.
The table above serves multiple purposes as it provides a list of college students, courses and departments. Having tables that serve multiple purposes like this creates serious problems for a database, and therefore, normalization is important.
?
Three problems that will be encountered in tables with the kind of structure above are;
- Insertion Anomalies
领英推荐
- Update Anomalies
- Deletion Anomalies
Insert Anomaly occurs when new data is inserted into a table, which then requires the insertion of additional data. Let me explain this using the table above. In that table, the Student_ID column serves as the primary key, and as you know, the primary key column must contain data before new records can be inserted into any other column on the table. This essentially means that I can not enter a new course code into the table until I enroll a new student, and there can’t be a new student until there is a student_ID. In summary, there can’t be a new course until there is a new student data. This is the insert anomaly.
?
Update Anomaly occurs when an attempt is made to update a record in a table only to discover that this will result in further updates across the table. Looking at the table above, the course code and department information is repeated or duplicated for each student on that course. This duplication increases database storage and makes it more difficult to maintain data changes. Consider a scenario where Dr. Lawrence, the director of the Mgt & Accounting department, is replaced with another director, now it becomes important to update all instances of Dr. Lawrence in the table with the new director’s name. This also means that records of students enrolled in that department must also be updated to reflect the change. It must also be emphasized that no student should be missed as that would then make the table contain inconsistent and inaccurate records. This is the update anomaly.
The last anomaly I must mention is the Deletion Anomaly. This kind of database problem occurs when the deletion of a record of data causes the deletion of more than one set of data in the database. In our example table, Seyifunmi with the student id of 4 has decided to leave her course. The deletion of her record will result in the loss of records for the Sociology department as it is dependent on her student ID. Basically, the removal of one instance of a record causes the deletion of other records. This is the deletion anomaly.
With the challenges outlined, what is then the solution? ?That is where Database Normalization comes in. Normalization optimizes the design of a database by creating a table for a single purpose. Using our example table, the normalization of that table will require a total redesign. The solution will be to split that single table into 3 different tables; Student table - to record information about each student, Course table - to record information about each course, and Department table - to record information about each department.
Summarily, database normalization serves as a fundamental principle in database design, offering a structured approach to optimizing data management. By addressing insertion, update, and deletion anomalies, normalization not only ensures data integrity but also leads to a more efficient and scalable database design. Beyond fixing these anomalies, normalization facilitates easier maintenance and updates, improves query performance, and enhances overall data integrity. It provides a solid foundation for managing data effectively, enabling organizations to adapt to evolving needs with ease.