?? Day 36: Exploring Practical SQL Implementation - Part 8??
JIGNESH KUMAR
Data Science Intern at Alma Better || Electrical and Instrumentation Engineer at SIC
A. Window/Analytic Functions in SQL:-
In SQL, Window/Analytic Functions bring a new dimension to data analysis by providing insights within a specific window of data. These functions operate over a range of rows related to the current row, allowing for advanced computations.
Here is a list of common Window/Analytic Functions in SQL:
1. RANK(): - Assigns a unique rank to each distinct row within the result set based on the ORDER BY clause. Rows with equal values receive the same rank, and the next rank is skipped.
2. DENSE_RANK():- Similar to RANK(), but it doesn't skip the next rank if there are equal values. It assigns consecutive ranks to rows with equal values.
3. ROW_NUMBER():- Assigns a unique number to each row within the result set. Unlike RANK() and DENSE_RANK(), it doesn't consider equal values.
4. LEAD():- Provides access to a row at a specified physical offset following that row within the result set. It's useful for comparing a row with subsequent rows.
5. LAG():- Similar to LEAD(), but provides access to a row at a specified physical offset preceding that row within the result set. It's useful for comparing a row with preceding rows.
6. FIRST_VALUE():- Returns the value of a specified expression for the first row within the window frame.
7. LAST_VALUE():- Returns the value of a specified expression for the last row within the window frame.
8. CUME_DIST():- Calculates the cumulative distribution of a value within a group of values.
9. PERCENT_RANK():- Calculates the relative rank of a value within a group of values.
10. NTILE():- Divides the result set into a specified number of roughly equal parts, assigning a bucket number to each row.
11. SUM():- Calculates the sum of a numeric column over a specified window frame.
12. AVG():- Calculates the average of a numeric column over a specified window frame.
13. MIN():- Returns the minimum value of a column over a specified window frame.
14. MAX():- Returns the maximum value of a column over a specified window frame.
15. COUNT():- Counts the number of rows in a window frame.
These functions are powerful tools for performing complex analytical operations within specific subsets of data in SQL queries.
Let's walk through examples for each of the Window/Analytic Functions:
1. RANK():
SELECT student_id, first_name, last_name, gender, fees_paid,
RANK() OVER (PARTITION BY gender ORDER BY fees_paid) AS rank_val
FROM student;
| student_id | first_name | last_name | gender | fees_paid | rank_val |
|------------|------------|-----------|--------|-----------|----------|
| 1 | Jignesh | Kumar | Male | 1000 | 1 |
| 2 | Kriti | Kumari | Female | 1200 | 1 |
| 3 | Siddharth | Kumar | Male | 800 | 2 |
2. DENSE_RANK():
SELECT department_id, salary, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS dense_rank_val
FROM employees;
| department_id | salary | dense_rank_val |
|---------------|--------|-----------------|
| 1 | 5000 | 1 |
| 1 | 6000 | 2 |
| 2 | 7000 | 1 |
3. ROW_NUMBER():
SELECT department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary) AS row_number_val
FROM employees;
| department_id | salary | row_number_val |
|---------------|--------|----------------|
| 1 | 5000 | 1 |
| 1 | 6000 | 2 |
| 2 | 7000 | 1 |
4. LEAD():
SELECT employee_id, salary, LEAD(salary) OVER (ORDER BY salary) AS next_salary
FROM employees;
| employee_id | salary | next_salary |
|-------------|--------|-------------|
| 1 | 5000 | 6000 |
| 2 | 6000 | 7000 |
| 3 | 7000 | NULL |
5. LAG():
SELECT employee_id, salary, LAG(salary) OVER (ORDER BY salary) AS prev_salary
FROM employees;
| employee_id | salary | prev_salary |
|-------------|--------|-------------|
| 1 | 5000 | NULL |
| 2 | 6000 | 5000 |
| 3 | 7000 | 6000 |
6. FIRST_VALUE():
SELECT department_id, employee_id, salary,
FIRST_VALUE(employee_id) OVER (PARTITION BY department_id ORDER BY salary) AS first_emp_id
FROM employees;
| department_id | employee_id | salary | first_emp_id |
|---------------|-------------|--------|--------------|
| 1 | 1 | 5000 | 1 |
| 1 | 2 | 6000 | 1 |
| 2 | 3 | 7000 | 3 |
7. LAST_VALUE():
SELECT employee_id, salary, LAST_VALUE(salary) OVER (ORDER BY salary) AS last_salary
FROM employees;
| employee_id | salary | last_salary |
|-------------|--------|-------------|
| 1 | 5000 | 7000 |
| 2 | 6000 | 7000 |
| 3 | 7000 | 7000 |
领英推荐
8. CUME_DIST():
SELECT employee_id, salary, CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist
FROM employees;
| employee_id | salary | cumulative_dist |
|-------------|--------|------------------|
| 1 | 5000 | 0.3333 |
| 2 | 6000 | 0.6667 |
| 3 | 7000 | 1.0 |
9. PERCENT_RANK():
SELECT employee_id, salary, PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank_val
FROM employees;
| employee_id | salary | percent_rank_val |
|-------------|--------|------------------|
| 1 | 5000 | 0.0 |
| 2 | 6000 | 0.5 |
| 3 | 7000 | 1.0 |
10. NTILE():
SELECT employee_id, salary, NTILE(2) OVER (ORDER BY salary) AS salary_bucket
FROM employees;
| employee_id | salary | salary_bucket |
|-------------|--------|---------------|
| 1 | 5000 | 1 |
| 2 | 6000 | 1 |
| 3 | 7000 | 2 |
11. SUM():
SELECT department_id, salary, SUM(salary) OVER (PARTITION BY department_id) AS total_salary
FROM employees;
| department_id | salary | total_salary |
|---------------|--------|--------------|
| 1 | 5000 | 11000 |
| 1 | 6000 | 11000 |
| 2 | 7000 | 7000 |
12. AVG():
SELECT department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg_salary
FROM employees;
| department_id | salary | avg_salary |
|---------------|--------|------------|
| 1 | 5000 | 5500 |
| 1 | 6000 | 5500 |
| 2 | 7000 | 7000 |
13. MIN():
SELECT department_id, salary, MIN(salary) OVER (PARTITION BY department_id) AS min_salary
FROM employees;
| department_id | salary | min_salary |
|---------------|--------|------------|
| 1 | 5000 | 5000 |
| 1 | 6000 | 5000 |
| 2 | 7000 | 7000 |
14. MAX():
SELECT department_id, salary, MAX(salary) OVER (PARTITION BY department_id) AS max_salary
FROM employees;
| department_id | salary | max_salary |
|---------------|--------|------------|
| 1 | 5000 | 6000 |
| 1 | 6000 | 6000 |
| 2 | 7000 | 7000 |
15. COUNT():
SELECT department_id, employee_id, COUNT(employee_id) OVER (PARTITION BY department_id) AS employee_count
FROM employees;
| department_id | employee_id | employee_count |
|---------------|-------------|----------------|
| 1 | 1 | 2 |
| 1 | 2 | 2 |
| 2 | 3 | 1 |
B. Subqueries in SQL:
Subqueries, or nested queries, are SQL queries embedded within another query. They serve various purposes, such as filtering results, performing calculations, or acting as a data source for the main query.
1. Single Row Subquery Example:
SELECT id, last_name, salary
FROM employee
WHERE salary = (
SELECT MAX(salary)
FROM employee
);
| id | last_name | salary |
|-----|-----------|--------|
| 101 | Samir | 8000 |
2. Multi-Row Subquery Example:
SELECT id, last_name, salary
FROM employee
WHERE salary IN (
SELECT salary
FROM employee
WHERE last_name LIKE 'C%'
);
| id | last_name | salary |
|-----|-----------|--------|
| 103 | Mahesh | 6500 |
| 105 | Kuldeep | 6700 |