Learning SQL for a career in analytics? You need to learn Window Functions
Learning #sql for your #analytics career? Learn how to use Window Functions.
Here's a table of population by country.
One row per country. There's also a region column.
If you're new to SQL, you might think that to get the total population by region, you would need to first sum the population by region. Like this:
Then join it to the original table:
Don't do that. Use a Window Function instead.
SELECT??country
?? ??? ?region,
?? ??? ?country_population,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) AS region_population
FROM wikipedia_population
ORDER BY country;
It's simpler and it produces the same result.
You are summing the country_population over each unique value in the region column.
领英推荐
See how there's a unique value of region_population for each value of region?
On top of that, you can use the Window Function in other calculations.
Let's get the proportion of the region that each country represents. If you haven't read my earlier article about how integer division behaves in SQL and you want to know why that * 1.0 is in there, read this.
We can also get the proportion of the global population that each country represents by removing the PARTITION BY clause. When we remove the PARTITION BY, we are telling the function to aggregate across the whole table. While we're at it, let's calculate the proportion that each region represents of the global population.
So there you have it. Window functions can simplify your work. You can also use other aggregations instead of SUM. Here's a more interesting example using LAG, which lets you get data from a row relative to the current row.
There's a whole lot more depth to Window Functions than I've outlined here, but hopefully this has given you a taste for what's possible.
--using a join and a subquer
SELECT wp.country, wp.region, wp.country_population, rp.region_population
FROM wikipedia_population wp
?? ?INNER JOIN (
?? ??? ??? ??? ?SELECT region, SUM(country_population) AS region_population
?? ??? ??? ??? ?FROM wikipedia_population
?? ??? ??? ??? ?GROUP BY region
?? ??? ??? ??? ?) rp ON wp.region = rp.region
ORDER BY wp.country;
--using window functions
SELECT?? ?country,
?? ??? ?region,
?? ??? ?country_population,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) AS region_population,
?? ??? ?country_population * 1.0 / SUM(country_population) OVER (PARTITION BY region) AS country_proportion_of_region,
?? ??? ?country_population * 1.0 / SUM(country_population) OVER () AS country_proportion_of_world,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) * 1.0 / SUM(country_population) OVER () AS region_proportion_of_world
FROM wikipedia_population
ORDER BY country;
--get the next biggest country
SELECT? country,
?? ??? ?country_population,
?? ??? ?LAG(country,1) OVER (ORDER BY country_population) AS next_biggest_country,
?? ??? ?LAG(country_population,1) OVER (ORDER BY country_population) AS next_biggest_country_population,
?? ??? ?country_population - LAG(country_population,1) OVER (ORDER BY country_population) population_difference
FROM wikipedia_population
ORDER BY country_population DESC;y