PROSQL 45: Complex SQL Questions Solution
Suparna Chowdhury
Data Scientist | Python, SQL, and Tableau Expert | Driving Data Insights
DAY 3: PROSQL45 CHALLENGE (18 NOVEMBER, 2024)
A big thanks to Ankit Bansal the year-end challenge! I’m taking it on by solving 2 complex SQL questions every day from a playlist of 79 problems.
?
PROBLEM 5: PARETO PRINCIPLE IN SQL
The Pareto Principle, also known as the 80/20 Rule, suggests that roughly 80% of outcomes result from 20% of causes.
Challenge Inspiration: https://www.youtube.com/watch?v=oGgE180oaTs&list=PLBTZqjSKn0IeKBQDjLmzisazhqQy4iGkb&index=6
For this we used the superstore orders data.
Step-by-Step Solution Approach:
1. product_sales CTE: We begin by calculating the total sales for each product.
2. running_total CTE: Next, we calculate the running total sales for each product, starting with the highest-selling products. The SUM(Product_sales) OVER(ORDER BY Product_sales DESC) function allows us to accumulate the sales in descending order, ensuring that the most sold products are accounted for first.
We also compute the total sales (total_sales) across all products and the total number of products (total_products) in this CTE.
3. Final SELECT query: The main query filters for products where the cumulative sales are less than or equal to 80% of the total sales. We then use COUNT(*) to determine how many products fall into this category and calculate the proportion of these products by dividing the count by the total number of products.
The result is approximately 22.18%, which supports the Pareto Principle (80/20 rule)—indicating that a small proportion of products contribute to the majority of sales.
WITH product_sales AS (
SELECT product_id, SUM(sales) AS Product_sales
FROM orders
GROUP BY product_id
),
running_total AS (
SELECT *,
SUM(Product_sales) OVER(ORDER BY Product_sales DESC) AS running_total,
SUM(Product_sales) OVER() AS total_sales,
COUNT(1) OVER() AS total_products
FROM product_sales
)
SELECT COUNT(*) * 1.0 / MIN(total_products) * 100 AS percentage_of_products
FROM running_total
WHERE running_total <= 0.8 * total_sales;
领英推荐
PROBLEM 6: Write a query to find PersonID, name, number of friends, sum of marks of person who have friends with total score greater than 100.
Challenge Inspiration: https://www.youtube.com/watch?v=SfzbR69LquU&list=PLBTZqjSKn0IeKBQDjLmzisazhqQy4iGkb&index=6
Create the Tables:
-- persons table
CREATE TABLE persons (
PersonID INT PRIMARY KEY,
Name VARCHAR(100),
Email VARCHAR(100),
Score INT
);
INSERT INTO persons (PersonID, Name, Email, Score) VALUES
(1, 'Alice', '[email protected]', 88),
(2, 'Bob', '[email protected]', 11),
(3, 'Davis', '[email protected]', 27),
(4, 'Tara', '[email protected]', 45),
(5, 'John', '[email protected]', 63);
-- Create the friends table
CREATE TABLE friends (
PersonID INT,
FriendID INT,
PRIMARY KEY (PersonID, FriendID)
);
INSERT INTO friends (PersonID, FriendID) VALUES
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(3, 5),
(4, 2),
(4, 3),
(4, 5);
Step-by-Step Solution Approach:
WITH cte AS (
SELECT f.PersonID
,COUNT(f.FriendID) AS no_of_friends
, SUM(p.Score) AS total_marks
FROM friendS F
INNER JOIN persons p
ON f.FriendID = p.PersonID
GROUP BY f.PersonID
HAVING SUM(p.Score) >= 100)
SELECT p.PersonID
, p.Name
, cte.no_of_friends
,cte.total_marks
FROM cte
INNER JOIN persons p
ON p.PersonID = cte.PersonID;
Let’s level up our SQL skills!
Join me on this journey as I tackle more challenges each day. Let’s improve our SQL skills together before the year ends!
?
Check out the full solutions on GitHub here!
?