登录查看更多内容

What is Normalization in DBMS (SQL)?

Vinesh Patel

Database Specialist, Database Development Team Leader

发布日期: 2024年3月21日

Normalization is a fundamental database design methodology aimed at minimizing data redundancy and mitigating anomalies such as Insertion, Update, and Deletion Anomalies. It achieves this by organizing data into smaller, related tables connected by relationships. The primary goal of normalization in SQL is to ensure data integrity by eliminating redundant information and structuring data logically.

The conceptual foundation of normalization traces back to Edgar Codd, the inventor of the relational model. Codd introduced the concept of normalization with the establishment of the First Normal Form and subsequently extended the theory with the introduction of the Second and Third Normal Forms. Later, in collaboration with Raymond F. Boyce, Codd further developed the Boyce-Codd Normal Form, refining the principles of database normalization.

Database Normal Forms

Here is a list of Normal Forms in SQL:

1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd Normal Form)
4NF (Fourth Normal Form)
5NF (Fifth Normal Form)
6NF (Sixth Normal Form)

1NF (First Normal Form)

The First Normal Form (1NF) is a database normalization form that eliminates repeating groups within a table. In 1NF, each column in a table contains atomic (indivisible) values, and each column must have a unique name.

Here's an example to illustrate 1NF:

Let's say we have a table called "StudentCourses" that stores information about students and the courses they are enrolled in. A non-normalized version of this table might look like this:

| Student ID | Student Name | Course 1    | Course 2   | Course 3   |
|------------|--------------|-------------|------------|------------|
| 1          | Alice        | Math        | Physics    | Chemistry  |
| 2          | Bob          | Physics     |            |            |
| 3          | Charlie      | Chemistry   | Biology    |            |

In the non-normalized table above, the columns "Course 1", "Course 2", and "Course 3" represent repeating groups. To bring this table into 1NF, we would need to eliminate these repeating groups by creating a separate table for courses and another table to represent the relationship between students and courses.

Here's how we can normalize this table into 1NF:

Student Table:

| Student ID | Student Name |
|------------|--------------|
| 1          | Alice        |
| 2          | Bob          |
| 3          | Charlie      |

Course Table:

| Course ID | Course Name |
|-----------|-------------|
| 1         | Math        |
| 2         | Physics     |
| 3         | Chemistry   |
| 4         | Biology     |

StudentCourses Table (Relationship Table):

| Student ID | Course ID |
|------------|-----------|
| 1          | 1         |
| 1          | 2         |
| 1          | 3         |
| 2          | 2         |
| 3          | 3         |
| 3          | 4         |

In this normalized structure:

The Student table contains unique student information.
The Course table contains unique course information.
The StudentCourses table represents the relationship between students and courses, with each row indicating a student enrolled in a particular course.

This normalized structure adheres to 1NF because each column contains atomic values, and there are no repeating groups.

2NF (Second Normal Form)

The Second Normal Form (2NF) is a database normalization form that eliminates partial dependencies within a table. A table is in 2NF if it is already in 1NF and if no non-prime attribute (an attribute that is not part of any candidate key) is dependent on only a portion of a candidate key.

To understand 2NF better, let's consider an example:

Suppose we have a table called "EmployeeProjects" that tracks information about employees and the projects they are assigned to, along with additional attributes like project details. Here's a non-normalized version of this table:

| Employee ID | Employee Name | Project ID | Project Name | Project Location |
|-------------|---------------|------------|--------------|-----------------|
| 1           | Alice         | 101        | Project A    | New York        |
| 2           | Bob           | 102        | Project B    | Los Angeles     |
| 3           | Charlie       | 101        | Project A    | New York        |
| 1           | Alice         | 102        | Project B    | Los Angeles     |

In the above table, the candidate key (a unique identifier) is {Employee ID, Project ID}. However, there is a partial dependency because the attribute "Project Name" depends only on "Project ID", which is part of the candidate key, while "Project Location" also depends on "Project ID". Therefore, this table is not in 2NF.

To bring this table into 2NF, we need to separate out the attributes that are functionally dependent on part of the candidate key. We'll split the table into two:

EmployeeProjects Table:

| Employee ID | Project ID |
|-------------|------------|
| 1           | 101        |
| 2           | 102        |
| 3           | 101        |
| 1           | 102        |

ProjectDetails Table:

| Project ID | Project Name | Project Location |
|------------|--------------|-----------------|
| 101        | Project A    | New York        |
| 102        | Project B    | Los Angeles     |

Now, both tables are in 1NF:

The EmployeeProjects table contains only attributes directly related to the relationship between employees and projects.
The ProjectDetails table contains attributes related to project details.

In this 2NF structure:

The EmployeeProjects table has a composite primary key {Employee ID, Project ID}, and neither "Project Name" nor "Project Location" is functionally dependent on only part of the composite key.
The ProjectDetails table has a primary key {Project ID}, and both "Project Name" and "Project Location" are functionally dependent on the entire primary key.

This normalization process ensures that each table contains non-redundant data and avoids anomalies associated with partial dependencies.

3NF (Third Normal Form)

The Third Normal Form (3NF) is a database normalization form that eliminates transitive dependencies within a table. A table is in 3NF if it is already in 2NF and if no non-prime attribute (an attribute that is not part of any candidate key) is transitively dependent on a candidate key.

To understand 3NF better, let's consider an example:

Suppose we have a table called "EmployeeDepartments" that stores information about employees, their departments, and the locations of those departments. Here's a non-normalized version of this table:

| Employee ID | Employee Name | Department | Department Location |
|-------------|---------------|------------|--------------------|
| 1           | Alice         | IT         | New York           |
| 2           | Bob           | HR         | Los Angeles        |
| 3           | Charlie       | IT         | New York           |
| 1           | Alice         | HR         | Los Angeles        |

In this table, the candidate key is {Employee ID}, and there are transitive dependencies. "Department Location" depends on "Department", which is not a candidate key, but instead depends on the candidate key "Employee ID". Therefore, this table is not in 3NF.

To bring this table into 3NF, we need to eliminate the transitive dependency. We can do this by separating out the attributes into two tables

Employees Table:

| Employee ID | Employee Name |
|-------------|---------------|
| 1           | Alice         |
| 2           | Bob           |
| 3           | Charlie       |

Departments Table:

| Department | Department Location |
|------------|--------------------|
| IT         | New York           |
| HR         | Los Angeles        |

EmployeeDepartmentAssignment Table:

| Employee ID | Department |
|-------------|------------|
| 1           | IT         |
| 2           | HR         |
| 3           | IT         |
| 1           | HR         |

Now, both tables are in 1NF and 2NF:

The Employees table contains only attributes directly related to employees.
The Departments table contains attributes related to departments.
The EmployeeDepartmentAssignment table contains the relationship between employees and departments.

In this 3NF structure:

The EmployeeDepartmentAssignment table has a composite primary key {Employee ID, Department}.
There are no transitive dependencies because "Department Location" is moved to the Departments table, where it is functionally dependent only on the primary key {Department}.

This normalization process ensures that each table contains non-redundant data and avoids anomalies associated with transitive dependencies.

BCNF (Boyce-Codd Normal Form)

The Boyce-Codd Normal Form (BCNF) is a stricter form of normalization than 3NF. A table is in BCNF if, for every non-trivial functional dependency X→Y, the determinant X is a superkey. In other words, BCNF ensures that every functional dependency in the table is a dependency on a candidate key.

Let's illustrate BCNF with an example:

Suppose we have a table called "Employees" that stores information about employees, including their ID, name, and department. Additionally, each department has a department code and a manager. Here's a non-normalized version of this table:

| Employee ID | Employee Name | Department Code | Department | Manager   |
|-------------|---------------|-----------------|------------|-----------|
| 1           | Alice         | IT              | IT         | Charlie   |
| 2           | Bob           | HR              | HR         | Alice     |
| 3           | Charlie       | IT              | IT         | Charlie   |
| 4           | David         | Sales           | Sales      | Bob       |

In this table, we can see that the determinant "Department Code" determines both "Department" and "Manager". However, "Department Code" is not a superkey, as it does not uniquely identify each row in the table. Therefore, this table is not in BCNF.

To bring this table into BCNF, we need to decompose it into smaller tables. We start by identifying the functional dependencies and candidate keys:

Candidate Key: {Employee ID}
Functional Dependencies:Employee ID → Employee Name, Department CodeDepartment Code → Department, Manager

We'll decompose the table into two tables:

领英推荐

Mastering Data Filtering & Sorting with SQL Server and…

Free Online Courses With Certificates 1 个月前

ANSI SQL 101 - Understanding the Syntax and Concepts

Alex Merced 1 年前

?? DBMS Series – Day 03: ER Model & ER Diagram…

Ijaz Khan 6 个月前

EmployeeInfo Table:

| Employee ID | Employee Name | Department Code |
|-------------|---------------|-----------------|
| 1           | Alice         | IT              |
| 2           | Bob           | HR              |
| 3           | Charlie       | IT              |
| 4           | David         | Sales           |

DepartmentInfo Table:

| Department Code | Department | Manager   |
|-----------------|------------|-----------|
| IT              | IT         | Charlie   |
| HR              | HR         | Alice     |
| Sales           | Sales      | Bob       |

Now, both tables are in BCNF:

In the EmployeeInfo table, the candidate key {Employee ID} determines all other attributes.
In the DepartmentInfo table, the candidate key {Department Code} determines all other attributes.

This decomposition ensures that every functional dependency is a dependency on a candidate key, satisfying the conditions of BCNF.

4NF (Fourth Normal Form)

The Fourth Normal Form (4NF) is a database normalization form that deals with multi-valued dependencies (MVDs) within a table. A table is in 4NF if it is already in BCNF and if it has no non-trivial multi-valued dependencies.

To understand 4NF better, let's consider an example:

Suppose we have a table called "EmployeeSkills" that stores information about employees and their skills. Each employee may have multiple skills, and each skill may be associated with multiple employees. Here's a non-normalized version of this table:

| Employee ID | Employee Name | Skill       |
|-------------|---------------|-------------|
| 1           | Alice         | Programming |
| 1           | Alice         | Database    |
| 2           | Bob           | Programming |
| 2           | Bob           | Design      |
| 3           | Charlie       | Database    |
| 3           | Charlie       | Testing     |

In this table, the determinant is {Employee ID, Employee Name}, and the multi-valued attribute is "Skill". This means that for each combination of {Employee ID, Employee Name}, there can be multiple values of "Skill". For example, both Alice and Bob have the skill "Programming". This is a multi-valued dependency.

To bring this table into 4NF, we need to separate the multi-valued attribute into a separate table. We'll create two tables:

Employees Table:

| Employee ID | Employee Name |
|-------------|---------------|
| 1           | Alice         |
| 2           | Bob           |
| 3           | Charlie       |

Skills Table:

| Employee ID | Skill       |
|-------------|-------------|
| 1           | Programming |
| 1           | Database    |
| 2           | Programming |
| 2           | Design      |
| 3           | Database    |
| 3           | Testing     |

Now, both tables are in 1NF, 2NF, 3NF, and BCNF:

The Employees table contains only attributes directly related to employees.
The Skills table contains a composite primary key {Employee ID, Skill}, where each row represents a single skill associated with an employee.

In this 4NF structure:

There are no non-trivial multi-valued dependencies because the Skills table represents a many-to-many relationship between employees and skills in a separate table.

This normalization process ensures that the database schema is free from redundancies and anomalies associated with multi-valued dependencies.

5NF (Fifth Normal Form)

The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJ/NF), is a database normalization form that addresses cases where there are overlapping composite keys and dependencies. A table is in 5NF if it is already in 4NF and if it is lossless-join decomposable into smaller tables.

To understand 5NF better, let's consider an example:

Suppose we have a table called "EmployeeProjects" that stores information about employees, projects, and their respective roles. Each employee may work on multiple projects, and each project may involve multiple employees with different roles. Here's a non-normalized version of this table:

| Employee ID | Project ID | Role       |
|-------------|------------|------------|
| 1           | 101        | Developer  |
| 1           | 102        | Designer   |
| 2           | 101        | Designer   |
| 3           | 102        | Developer  |
| 3           | 103        | Tester     |

In this table, the composite key is {Employee ID, Project ID}. Each row represents an employee's role in a specific project.

To bring this table into 5NF, we first need to identify any overlapping composite keys and dependencies. In this case, there's no such overlap, but we'll illustrate how to decompose it for 5NF:

Employees Table:

| Employee ID |
|-------------|
| 1           |
| 2           |
| 3           |

Projects Table:

| Project ID |
|------------|
| 101        |
| 102        |
| 103        |

EmployeeRoles Table:

| Employee ID | Project ID | Role       |
|-------------|------------|------------|
| 1           | 101        | Developer  |
| 1           | 102        | Designer   |
| 2           | 101        | Designer   |
| 3           | 102        | Developer  |
| 3           | 103        | Tester     |

Now, each table represents a distinct entity without any overlapping composite keys or dependencies. Additionally, we can reconstruct the original table by performing a natural join between these tables.

In this 5NF structure:

Each table is in 1NF, 2NF, 3NF, and 4NF.
The decomposition is lossless-join because we can reconstruct the original table by joining the decomposed tables.

This normalization process ensures that the database schema is free from redundancies and anomalies associated with overlapping composite keys and dependencies.

6NF (Sixth Normal Form)

The Sixth Normal Form (6NF) is a theoretical level of normalization that addresses situations where there are non-trivial join dependencies involving more than one key. In simple terms, 6NF aims to eliminate any remaining redundancies or dependencies that may exist after applying lower normalization forms.

However, 6NF is rarely encountered in practice and is more of a theoretical concept. It's typically only relevant in highly specialized scenarios where extreme normalization is required, such as in some academic or research contexts.

To illustrate 6NF, let's imagine a hypothetical scenario where we have a complex data model with multiple interrelated entities and dependencies. Consider a case where we have a database to manage conference proceedings. Each proceeding has multiple authors, and each author can be associated with multiple proceedings. Additionally, each proceeding may contain multiple papers, and each paper may have multiple authors.

Here's a simplified representation of such a scenario:

Proceedings

Proceeding ID		Proceeding Title
-------------------------------------------------
1			Proceeding1
2			Proceeding2
3			Proceeding3

Authors

Author ID			Author Name
-------------------------------------------------
1				Vinesh
2				Amit
3				Aagam

Papers

Paper ID			Paper Title			Proceeding ID
--------------------------------------------------------------------------
1				Time Of India		        1
2				Indian Express		        1
3				News India			2
4				BBC News			1

PaperAuthors

Paper ID			Author ID
--------------------------------------------------
1				1
1				2
1				3
2				2
3				1
4				3

In this scenario:

The Proceedings table contains information about each conference proceeding.
The Authors table stores details about each author.
The Papers table contains information about each paper, along with the proceeding it belongs to.
The PaperAuthors table serves as a junction table to represent the many-to-many relationship between papers and authors.

To achieve 6NF, we would analyze the dependencies and relationships among these tables to ensure that there are no non-trivial join dependencies involving more than one key. This process might involve further normalization or restructuring of the data model, potentially leading to the creation of additional tables or the introduction of more complex relationships.

In practical database design, reaching 6NF is often unnecessary and may even be counterproductive, as it can lead to overly complex schemas that are difficult to maintain and query efficiently. In most cases, normalization up to 3NF or BCNF is sufficient to ensure data integrity and minimize redundancies.

Denormalization

Sometimes, Denormalization can be preferable to normalization in SQL Server environments due to its potential to improve query performance, simplify schema complexity, enhance read performance, aid scalability, better support reporting and analytics needs, and optimize for specific use cases. However, it's crucial to carefully consider the trade-offs, such as increased data redundancy and decreased data consistency, and to apply denormalization judiciously to avoid over-complication and maintain data integrity. for several reasons:

Improved Query Performance: By denormalizing data, you can reduce the need for complex joins and aggregations, which can significantly improve query performance, especially in scenarios where large datasets are involved or where queries are complex and frequent. Join operations can be resource-intensive, and reducing them can lead to faster query execution times.
Reduced Complexity: Denormalized schemas are often simpler to understand and query, especially for developers who are not familiar with the intricacies of the database schema. Simplified schemas can lead to more straightforward SQL queries, reducing development time and potential errors.
Enhanced Read Performance: In read-heavy applications, denormalization can be particularly beneficial as it allows data to be structured in a way that optimizes read operations. Aggregates, summaries, and frequently accessed data can be precomputed and stored, eliminating the need for expensive calculations during query execution.
Improved Scalability: In distributed environments or NoSQL databases where scalability is a concern, denormalization can help improve scalability by reducing the need for distributed joins and allowing data to be stored closer to where it's needed. This can help distribute the workload more evenly across the database infrastructure.
Better Support for Reporting and Analytics: Denormalized schemas are often better suited for reporting and analytics purposes, as they can provide a consolidated view of data that is optimized for analytical queries. By precomputing and aggregating data, denormalized schemas can support complex reporting requirements more efficiently.
Optimized for Specific Use Cases: In some cases, denormalization is necessary to optimize the database schema for specific use cases or application requirements. For example, in data warehousing environments where query performance is critical, denormalization may be necessary to achieve the desired level of performance.

Example: Let's consider a scenario where you have an e-commerce application with a high volume of orders. In the normalized schema, you have separate tables for orders, customers, and products. However, to improve query performance for order retrieval, especially in scenarios where orders are frequently queried along with customer and product information, you might choose to denormalize the order table by including customer and product details directly within the order table. This denormalized structure can reduce the need for joins when querying orders and improve overall system performance, especially in read-heavy workloads. However, you need to carefully evaluate the trade-offs and ensure that data integrity is maintained, especially in scenarios involving data modification operations.

要查看或添加评论，请登录

Vinesh Patel的更多文章

SQL Server Agent Job

2024年7月23日

SQL Server Agent Job

An SQL Server Agent Job is a defined series of operations or tasks that SQL Server Agent performs. These tasks can…

1 条评论
Types of the Tuning

2024年4月26日

Types of the Tuning

Database tuning involves optimizing various aspects of a database system to improve its performance, efficiency, and…
ACID (Atomicity, Consistency, Isolation, Durability) in SQL SERVER

2024年4月9日

ACID (Atomicity, Consistency, Isolation, Durability) in SQL SERVER

the ACID (Atomicity, Consistency, Isolation, Durability) properties are a set of characteristics that ensure…
SQL SERVER – Understand Grant, Deny, and Revoke Permissions

2024年3月14日

SQL SERVER – Understand Grant, Deny, and Revoke Permissions

SQL Server implements a permissions framework centered on discrete permissions and inheritance. It employs a…

1 条评论
How do you store multiple execution plans for the same query in an SQL server?

2024年3月11日

How do you store multiple execution plans for the same query in an SQL server?

storing multiple execution plans for the same query is not a built-in feature. By default, SQL Server's query optimizer…

1 条评论
Role

2024年3月11日

Role

What is the Role of SQL SERVER? How do you think you could use and manage Role? The role is a database-level security…

See all articles

What is Normalization in DBMS (SQL)?

Vinesh Patel

Database Specialist, Database Development Team Leader

Database Normal Forms

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd Normal Form)

领英推荐

4NF (Fourth Normal Form)

5NF (Fifth Normal Form)

6NF (Sixth Normal Form)

Denormalization

Vinesh Patel的更多文章

社区洞察

其他会员也浏览了

List of SQL Topics to follow for a comprehensive End to End learning.

SQL Basics: Your Complete Beginner's Guide to Mastering Database Management

SQL complete notes

The Importance and Best Practices of SQL Language

100 SQL functions and statements with brief explanations and examples

How to create a simple CRUD database interface in Google Go

What is SQL?

Understanding Limit and Offset in Database Queries

How to write a Stored Procedure in SQL

Storing Hierarchical Data In SQL Server

Database Normal Forms

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd Normal Form)

领英推荐

4NF (Fourth Normal Form)

5NF (Fifth Normal Form)

6NF (Sixth Normal Form)

Denormalization

Vinesh Patel的更多文章

SQL Server Agent Job

Types of the Tuning

ACID (Atomicity, Consistency, Isolation, Durability) in SQL SERVER

SQL SERVER – Understand Grant, Deny, and Revoke Permissions

How do you store multiple execution plans for the same query in an SQL server?

Role

社区洞察

其他会员也浏览了

List of SQL Topics to follow for a comprehensive End to End learning.

SQL Basics: Your Complete Beginner's Guide to Mastering Database Management

SQL complete notes

The Importance and Best Practices of SQL Language

100 SQL functions and statements with brief explanations and examples

How to create a simple CRUD database interface in Google Go

What is SQL?

Understanding Limit and Offset in Database Queries

How to write a Stored Procedure in SQL

Storing Hierarchical Data In SQL Server