What is Normalization in DBMS (SQL)?
Normalization is a fundamental database design methodology aimed at minimizing data redundancy and mitigating anomalies such as Insertion, Update, and Deletion Anomalies. It achieves this by organizing data into smaller, related tables connected by relationships. The primary goal of normalization in SQL is to ensure data integrity by eliminating redundant information and structuring data logically.
The conceptual foundation of normalization traces back to Edgar Codd, the inventor of the relational model. Codd introduced the concept of normalization with the establishment of the First Normal Form and subsequently extended the theory with the introduction of the Second and Third Normal Forms. Later, in collaboration with Raymond F. Boyce, Codd further developed the Boyce-Codd Normal Form, refining the principles of database normalization.
Database Normal Forms
Here is a list of Normal Forms in SQL:
1NF (First Normal Form)
The First Normal Form (1NF) is a database normalization form that eliminates repeating groups within a table. In 1NF, each column in a table contains atomic (indivisible) values, and each column must have a unique name.
Here's an example to illustrate 1NF:
Let's say we have a table called "StudentCourses" that stores information about students and the courses they are enrolled in. A non-normalized version of this table might look like this:
| Student ID | Student Name | Course 1 | Course 2 | Course 3 |
|------------|--------------|-------------|------------|------------|
| 1 | Alice | Math | Physics | Chemistry |
| 2 | Bob | Physics | | |
| 3 | Charlie | Chemistry | Biology | |
In the non-normalized table above, the columns "Course 1", "Course 2", and "Course 3" represent repeating groups. To bring this table into 1NF, we would need to eliminate these repeating groups by creating a separate table for courses and another table to represent the relationship between students and courses.
Here's how we can normalize this table into 1NF:
Student Table:
| Student ID | Student Name |
|------------|--------------|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
Course Table:
| Course ID | Course Name |
|-----------|-------------|
| 1 | Math |
| 2 | Physics |
| 3 | Chemistry |
| 4 | Biology |
StudentCourses Table (Relationship Table):
| Student ID | Course ID |
|------------|-----------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 3 | 3 |
| 3 | 4 |
In this normalized structure:
This normalized structure adheres to 1NF because each column contains atomic values, and there are no repeating groups.
2NF (Second Normal Form)
The Second Normal Form (2NF) is a database normalization form that eliminates partial dependencies within a table. A table is in 2NF if it is already in 1NF and if no non-prime attribute (an attribute that is not part of any candidate key) is dependent on only a portion of a candidate key.
To understand 2NF better, let's consider an example:
Suppose we have a table called "EmployeeProjects" that tracks information about employees and the projects they are assigned to, along with additional attributes like project details. Here's a non-normalized version of this table:
| Employee ID | Employee Name | Project ID | Project Name | Project Location |
|-------------|---------------|------------|--------------|-----------------|
| 1 | Alice | 101 | Project A | New York |
| 2 | Bob | 102 | Project B | Los Angeles |
| 3 | Charlie | 101 | Project A | New York |
| 1 | Alice | 102 | Project B | Los Angeles |
In the above table, the candidate key (a unique identifier) is {Employee ID, Project ID}. However, there is a partial dependency because the attribute "Project Name" depends only on "Project ID", which is part of the candidate key, while "Project Location" also depends on "Project ID". Therefore, this table is not in 2NF.
To bring this table into 2NF, we need to separate out the attributes that are functionally dependent on part of the candidate key. We'll split the table into two:
EmployeeProjects Table:
| Employee ID | Project ID |
|-------------|------------|
| 1 | 101 |
| 2 | 102 |
| 3 | 101 |
| 1 | 102 |
ProjectDetails Table:
| Project ID | Project Name | Project Location |
|------------|--------------|-----------------|
| 101 | Project A | New York |
| 102 | Project B | Los Angeles |
Now, both tables are in 1NF:
In this 2NF structure:
This normalization process ensures that each table contains non-redundant data and avoids anomalies associated with partial dependencies.
3NF (Third Normal Form)
The Third Normal Form (3NF) is a database normalization form that eliminates transitive dependencies within a table. A table is in 3NF if it is already in 2NF and if no non-prime attribute (an attribute that is not part of any candidate key) is transitively dependent on a candidate key.
To understand 3NF better, let's consider an example:
Suppose we have a table called "EmployeeDepartments" that stores information about employees, their departments, and the locations of those departments. Here's a non-normalized version of this table:
| Employee ID | Employee Name | Department | Department Location |
|-------------|---------------|------------|--------------------|
| 1 | Alice | IT | New York |
| 2 | Bob | HR | Los Angeles |
| 3 | Charlie | IT | New York |
| 1 | Alice | HR | Los Angeles |
In this table, the candidate key is {Employee ID}, and there are transitive dependencies. "Department Location" depends on "Department", which is not a candidate key, but instead depends on the candidate key "Employee ID". Therefore, this table is not in 3NF.
To bring this table into 3NF, we need to eliminate the transitive dependency. We can do this by separating out the attributes into two tables
Employees Table:
| Employee ID | Employee Name |
|-------------|---------------|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
Departments Table:
| Department | Department Location |
|------------|--------------------|
| IT | New York |
| HR | Los Angeles |
EmployeeDepartmentAssignment Table:
| Employee ID | Department |
|-------------|------------|
| 1 | IT |
| 2 | HR |
| 3 | IT |
| 1 | HR |
Now, both tables are in 1NF and 2NF:
In this 3NF structure:
This normalization process ensures that each table contains non-redundant data and avoids anomalies associated with transitive dependencies.
BCNF (Boyce-Codd Normal Form)
The Boyce-Codd Normal Form (BCNF) is a stricter form of normalization than 3NF. A table is in BCNF if, for every non-trivial functional dependency X→Y, the determinant X is a superkey. In other words, BCNF ensures that every functional dependency in the table is a dependency on a candidate key.
Let's illustrate BCNF with an example:
Suppose we have a table called "Employees" that stores information about employees, including their ID, name, and department. Additionally, each department has a department code and a manager. Here's a non-normalized version of this table:
| Employee ID | Employee Name | Department Code | Department | Manager |
|-------------|---------------|-----------------|------------|-----------|
| 1 | Alice | IT | IT | Charlie |
| 2 | Bob | HR | HR | Alice |
| 3 | Charlie | IT | IT | Charlie |
| 4 | David | Sales | Sales | Bob |
In this table, we can see that the determinant "Department Code" determines both "Department" and "Manager". However, "Department Code" is not a superkey, as it does not uniquely identify each row in the table. Therefore, this table is not in BCNF.
To bring this table into BCNF, we need to decompose it into smaller tables. We start by identifying the functional dependencies and candidate keys:
We'll decompose the table into two tables:
领英推荐
EmployeeInfo Table:
| Employee ID | Employee Name | Department Code |
|-------------|---------------|-----------------|
| 1 | Alice | IT |
| 2 | Bob | HR |
| 3 | Charlie | IT |
| 4 | David | Sales |
DepartmentInfo Table:
| Department Code | Department | Manager |
|-----------------|------------|-----------|
| IT | IT | Charlie |
| HR | HR | Alice |
| Sales | Sales | Bob |
Now, both tables are in BCNF:
This decomposition ensures that every functional dependency is a dependency on a candidate key, satisfying the conditions of BCNF.
4NF (Fourth Normal Form)
The Fourth Normal Form (4NF) is a database normalization form that deals with multi-valued dependencies (MVDs) within a table. A table is in 4NF if it is already in BCNF and if it has no non-trivial multi-valued dependencies.
To understand 4NF better, let's consider an example:
Suppose we have a table called "EmployeeSkills" that stores information about employees and their skills. Each employee may have multiple skills, and each skill may be associated with multiple employees. Here's a non-normalized version of this table:
| Employee ID | Employee Name | Skill |
|-------------|---------------|-------------|
| 1 | Alice | Programming |
| 1 | Alice | Database |
| 2 | Bob | Programming |
| 2 | Bob | Design |
| 3 | Charlie | Database |
| 3 | Charlie | Testing |
In this table, the determinant is {Employee ID, Employee Name}, and the multi-valued attribute is "Skill". This means that for each combination of {Employee ID, Employee Name}, there can be multiple values of "Skill". For example, both Alice and Bob have the skill "Programming". This is a multi-valued dependency.
To bring this table into 4NF, we need to separate the multi-valued attribute into a separate table. We'll create two tables:
Employees Table:
| Employee ID | Employee Name |
|-------------|---------------|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
Skills Table:
| Employee ID | Skill |
|-------------|-------------|
| 1 | Programming |
| 1 | Database |
| 2 | Programming |
| 2 | Design |
| 3 | Database |
| 3 | Testing |
Now, both tables are in 1NF, 2NF, 3NF, and BCNF:
In this 4NF structure:
This normalization process ensures that the database schema is free from redundancies and anomalies associated with multi-valued dependencies.
5NF (Fifth Normal Form)
The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJ/NF), is a database normalization form that addresses cases where there are overlapping composite keys and dependencies. A table is in 5NF if it is already in 4NF and if it is lossless-join decomposable into smaller tables.
To understand 5NF better, let's consider an example:
Suppose we have a table called "EmployeeProjects" that stores information about employees, projects, and their respective roles. Each employee may work on multiple projects, and each project may involve multiple employees with different roles. Here's a non-normalized version of this table:
| Employee ID | Project ID | Role |
|-------------|------------|------------|
| 1 | 101 | Developer |
| 1 | 102 | Designer |
| 2 | 101 | Designer |
| 3 | 102 | Developer |
| 3 | 103 | Tester |
In this table, the composite key is {Employee ID, Project ID}. Each row represents an employee's role in a specific project.
To bring this table into 5NF, we first need to identify any overlapping composite keys and dependencies. In this case, there's no such overlap, but we'll illustrate how to decompose it for 5NF:
Employees Table:
| Employee ID |
|-------------|
| 1 |
| 2 |
| 3 |
Projects Table:
| Project ID |
|------------|
| 101 |
| 102 |
| 103 |
EmployeeRoles Table:
| Employee ID | Project ID | Role |
|-------------|------------|------------|
| 1 | 101 | Developer |
| 1 | 102 | Designer |
| 2 | 101 | Designer |
| 3 | 102 | Developer |
| 3 | 103 | Tester |
Now, each table represents a distinct entity without any overlapping composite keys or dependencies. Additionally, we can reconstruct the original table by performing a natural join between these tables.
In this 5NF structure:
This normalization process ensures that the database schema is free from redundancies and anomalies associated with overlapping composite keys and dependencies.
6NF (Sixth Normal Form)
The Sixth Normal Form (6NF) is a theoretical level of normalization that addresses situations where there are non-trivial join dependencies involving more than one key. In simple terms, 6NF aims to eliminate any remaining redundancies or dependencies that may exist after applying lower normalization forms.
However, 6NF is rarely encountered in practice and is more of a theoretical concept. It's typically only relevant in highly specialized scenarios where extreme normalization is required, such as in some academic or research contexts.
To illustrate 6NF, let's imagine a hypothetical scenario where we have a complex data model with multiple interrelated entities and dependencies. Consider a case where we have a database to manage conference proceedings. Each proceeding has multiple authors, and each author can be associated with multiple proceedings. Additionally, each proceeding may contain multiple papers, and each paper may have multiple authors.
Here's a simplified representation of such a scenario:
Proceedings
Proceeding ID Proceeding Title
-------------------------------------------------
1 Proceeding1
2 Proceeding2
3 Proceeding3
Authors
Author ID Author Name
-------------------------------------------------
1 Vinesh
2 Amit
3 Aagam
Papers
Paper ID Paper Title Proceeding ID
--------------------------------------------------------------------------
1 Time Of India 1
2 Indian Express 1
3 News India 2
4 BBC News 1
PaperAuthors
Paper ID Author ID
--------------------------------------------------
1 1
1 2
1 3
2 2
3 1
4 3
In this scenario:
To achieve 6NF, we would analyze the dependencies and relationships among these tables to ensure that there are no non-trivial join dependencies involving more than one key. This process might involve further normalization or restructuring of the data model, potentially leading to the creation of additional tables or the introduction of more complex relationships.
In practical database design, reaching 6NF is often unnecessary and may even be counterproductive, as it can lead to overly complex schemas that are difficult to maintain and query efficiently. In most cases, normalization up to 3NF or BCNF is sufficient to ensure data integrity and minimize redundancies.
Denormalization
Sometimes, Denormalization can be preferable to normalization in SQL Server environments due to its potential to improve query performance, simplify schema complexity, enhance read performance, aid scalability, better support reporting and analytics needs, and optimize for specific use cases. However, it's crucial to carefully consider the trade-offs, such as increased data redundancy and decreased data consistency, and to apply denormalization judiciously to avoid over-complication and maintain data integrity. for several reasons:
Example: Let's consider a scenario where you have an e-commerce application with a high volume of orders. In the normalized schema, you have separate tables for orders, customers, and products. However, to improve query performance for order retrieval, especially in scenarios where orders are frequently queried along with customer and product information, you might choose to denormalize the order table by including customer and product details directly within the order table. This denormalized structure can reduce the need for joins when querying orders and improve overall system performance, especially in read-heavy workloads. However, you need to carefully evaluate the trade-offs and ensure that data integrity is maintained, especially in scenarios involving data modification operations.