Key Concepts in Databases: Keys, Joins, Query Optimization, and Normalization
Momen Negm
Chief Technology Officer @ T-Vencubator | Data Scientist, Generative AI | Tech entrepreneur - Engineering leader
Save this article, as it covers the core concepts of databases: types of keys, types of joins, query optimization, and normalization in SQL databases.
Keys in Databases: SQL vs. NoSQL
Keys are essential components in databases, serving as unique identifiers or references that uphold data integrity and enable efficient data retrieval. Below is an overview of key types in both SQL and NoSQL databases:
SQL Databases:
Example: Table: Customer (customer_id (PK), name, email)
customer_id uniquely identifies each customer record.
2. Candidate Key:
Example: Table: Order (order_id (PK), customer_id, order_date)
Both order_id and customer_id (alone) can uniquely identify an order, but order_id is chosen as the primary key for efficiency.
3. Foreign Key (FK):
Example: Table: Order (order_id (PK), customer_id (FK), order_date)
customer_id in the Order table references the primary key customer_id in the Customer table, linking orders to specific customers.
4. Unique Key:
Example: Table: Product (product_id (PK), name, unique_code)
unique_code can ensure no duplicate product codes exist, even if some products might not have a value for it.
NoSQL Databases:
Due to the flexible schema nature of NoSQL databases, the concept of keys can differ slightly. Here are common approaches:
Example: JSON
{
"_id": ObjectId("...") // Unique document identifier
"name": "John Doe",
"email": "[email protected]"
}
2. Partition Key (Sharded Databases):
Example: (Consider a sharded database storing user data)
Partition key: user_id
Documents with similar user IDs would be placed on the same shard, allowing faster access to user-specific data.
Key Considerations:
SQL Query Optimization: Strategies and Techniques
Optimizing SQL queries is essential for ensuring efficient data retrieval and improving the performance of your database applications.
Understanding the Problem:
Optimization Techniques:
Example (Before):SQL
SELECT * FROM Products;
Example (After):SQL
SELECT product_id, name, price FROM Products;
2. Filter Data Effectively (WHERE):
Example (Before): SQL
SELECT * FROM Orders WHERE order_date > '2023-11-01';
Example (After): SQL
CREATE INDEX idx_order_date ON Orders (order_date); -- Assuming an index doesn't exist
SELECT * FROM Orders WHERE order_date > '2023-11-01';
3. Optimize JOINs:
Example (Before): SQL
SELECT o.order_id, p.name, c.email
FROM Orders o
JOIN Products p ON o.product_id = p.product_id
JOIN Customers c ON o.customer_id = c.customer_id;
Example (After) (Assuming a one-to-many relationship between Orders and Customers): SQL
SELECT o.order_id, p.name, c.email
FROM Orders o
INNER JOIN Products p ON o.product_id = p.product_id
INNER JOIN Customers c ON o.customer_id = c.customer_id;
4. Utilize Subqueries Wisely:
5. Leverage Temporary Tables:
6. Use LIMIT or TOP (for Pagination):
Important Queries and Keywords:
Joins in SQL vs. NoSQL Databases
Joins are a fundamental concept in relational databases (SQL) used to combine data from multiple tables based on a specific relationship. NoSQL databases, due to their schema flexibility, handle data differently, but techniques exist to achieve similar results.
Why Joins in SQL?
Example (SQL):
SELECT o.order_id, c.name, p.product_name, p.price
FROM Orders o
INNER JOIN Customers c ON o.customer_id = c.customer_id
INNER JOIN Products p ON o.product_id = p.product_id;
This query retrieves order details, customer names, product names, and prices by joining the Orders, Customers, and Products tables based on their relationships.
Why Not Joins in NoSQL (Technically)?
Approaches for Combining Data in NoSQL:
Advanced Join Example (SQL):
Imagine a database with these tables:
You want to find all Computer Science majors and the courses they’re enrolled in, along with their grades.
SQL
SELECT s.name, c.name, e.grade
FROM Students s
INNER JOIN Enrollments e ON s.student_id = e.student_id
INNER JOIN Courses c ON e.course_id = c.course_id
WHERE s.major = 'Computer Science';
This advanced join combines three tables to retrieve the desired information.
Advanced Data Retrieval in NoSQL (Example):
Consider a document-oriented NoSQL database with these collections:
To achieve a similar result as in the SQL example, you might use a combination of queries:
db.Students.find({ major: "Computer Science" })
2. For each student document, use the student_id to retrieve their enrolled courses from the Enrollments collection and their details from the Courses collection (potentially using multiple queries).
The choice between SQL and NoSQL depends on your data model and query requirements.
While SQL joins are powerful for predefined relationships, NoSQL offers flexibility but might require more complex retrieval logic at the application layer.
Normalization in SQL databases is a process of organizing your data tables to minimize redundancy, improve data integrity, and enhance database efficiency. It involves breaking down complex tables into smaller, more focused tables with defined relationships.
There are several levels of normalization, each aiming to further reduce redundancy:
First Normal Form (1NF):
Definition: Eliminates duplicate data within a single table. Each table cell (attribute value) should be atomic (cannot be further divided).
Each table must have a primary key (unique identifier for each row).
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL
Second Normal Form (2NF):
Definition: Meets all the requirements of 1NF. Eliminates partial dependencies. A column (attribute) should depend only on the entire primary key, not on a part of it.
Before (In 1NF, but not 2NF):
| Order ID | Customer ID | City | Country |
| - - - - -| - - - - - - | - - - - -| - - - - |
| 100 | 1 | New York | USA |
| 101 | 2 | London | UK |
| 102 | 1 | Chicago | USA |
The City attribute depends only on the Country attribute, not on the entire primary key (Order ID and Customer ID).
After (In 2NF):
| Order ID | Customer ID |
| - - - - -| - - - - - - |
| 100 | 1 |
| 101 | 2 |
| 102 | 1 |
(Separate table CustomerLocations created, not shown here)
| Customer ID | City | Country |
| - - - - - - | - - - - -| - - - - |
| 1 | New York | USA |
| 2 | London | UK |
We create a separate CustomerLocations table to store city and country information, linked to the Customers table using the customer_id foreign key. 2NF Example (Normalization Script):
-- Assuming existing tables: Orders (order_id, customer_id, city, country) and Customers (customer_id, name)
-- Create a new table for customer locations
CREATE TABLE CustomerLocations (
customer_id INT PRIMARY KEY,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id),
city VARCHAR(255),
country VARCHAR(255)
);
-- Update Orders table to remove redundant location columns
ALTER TABLE Orders
DROP COLUMN city,
DROP COLUMN country;
-- Insert location data into the new table (assuming you have the data)
INSERT INTO CustomerLocations (customer_id, city, country)
SELECT customer_id, city, country FROM Orders;
-- Clean up any duplicate entries in CustomerLocations (if applicable)
Normalization is an iterative process. Analyze your data model and query patterns to determine the appropriate level of normalization.
While it reduces redundancy, it can sometimes introduce complexity with more joins required in queries.