Learning SQL for a career in analytics? You need to learn Window Functions

Learning SQL for a career in analytics? You need to learn Window Functions

Learning #sql for your #analytics career? Learn how to use Window Functions.

Here's a table of population by country.

One row per country. There's also a region column.

No alt text provided for this image

If you're new to SQL, you might think that to get the total population by region, you would need to first sum the population by region. Like this:

No alt text provided for this image

Then join it to the original table:

No alt text provided for this image

Don't do that. Use a Window Function instead.

SELECT??country
?? ??? ?region,
?? ??? ?country_population,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) AS region_population
FROM wikipedia_population
ORDER BY country;        

It's simpler and it produces the same result.

You are summing the country_population over each unique value in the region column.

See how there's a unique value of region_population for each value of region?

No alt text provided for this image

On top of that, you can use the Window Function in other calculations.

Let's get the proportion of the region that each country represents. If you haven't read my earlier article about how integer division behaves in SQL and you want to know why that * 1.0 is in there, read this.

No alt text provided for this image

We can also get the proportion of the global population that each country represents by removing the PARTITION BY clause. When we remove the PARTITION BY, we are telling the function to aggregate across the whole table. While we're at it, let's calculate the proportion that each region represents of the global population.

No alt text provided for this image

So there you have it. Window functions can simplify your work. You can also use other aggregations instead of SUM. Here's a more interesting example using LAG, which lets you get data from a row relative to the current row.

No alt text provided for this image

There's a whole lot more depth to Window Functions than I've outlined here, but hopefully this has given you a taste for what's possible.

--using a join and a subquer
SELECT wp.country, wp.region, wp.country_population, rp.region_population
FROM wikipedia_population wp
?? ?INNER JOIN (
?? ??? ??? ??? ?SELECT region, SUM(country_population) AS region_population
?? ??? ??? ??? ?FROM wikipedia_population
?? ??? ??? ??? ?GROUP BY region
?? ??? ??? ??? ?) rp ON wp.region = rp.region
ORDER BY wp.country;

--using window functions
SELECT?? ?country,
?? ??? ?region,
?? ??? ?country_population,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) AS region_population,
?? ??? ?country_population * 1.0 / SUM(country_population) OVER (PARTITION BY region) AS country_proportion_of_region,
?? ??? ?country_population * 1.0 / SUM(country_population) OVER () AS country_proportion_of_world,
?? ??? ?SUM(country_population) OVER (PARTITION BY region) * 1.0 / SUM(country_population) OVER () AS region_proportion_of_world
FROM wikipedia_population
ORDER BY country;

--get the next biggest country
SELECT? country,
?? ??? ?country_population,
?? ??? ?LAG(country,1) OVER (ORDER BY country_population) AS next_biggest_country,
?? ??? ?LAG(country_population,1) OVER (ORDER BY country_population) AS next_biggest_country_population,
?? ??? ?country_population - LAG(country_population,1) OVER (ORDER BY country_population) population_difference
FROM wikipedia_population
ORDER BY country_population DESC;y        


要查看或添加评论,请登录

Owen Price的更多文章

  • Excel LAMBDA Spotlight: Bisected Map with BMAPλ

    Excel LAMBDA Spotlight: Bisected Map with BMAPλ

    In this article I'll look at a LAMBDA function called BMAPλ, which was written by Peter Bartholomew. You can grab the…

    8 条评论
  • Excel LAMBDA Spotlight: Sudoku Solver!

    Excel LAMBDA Spotlight: Sudoku Solver!

    In this article I'll look at a LAMBDA function called solver, which was recommended to me by Bhavya Gupta and written…

    16 条评论
  • Excel LAMBDA Spotlight: SumColumnsλ

    Excel LAMBDA Spotlight: SumColumnsλ

    In this article I'll look at a LAMBDA function called SumColumnsλ, written by Craig Hatmaker. Craig is very active in…

    4 条评论
  • Excel LAMBDA Spotlight: INFLECTIONPOINTS

    Excel LAMBDA Spotlight: INFLECTIONPOINTS

    In this article I'll look at a LAMBDA function called INFLECTIONPOINTS. This is a function I wrote last year to quickly…

    8 条评论
  • Excel LAMBDA Spotlight: SHOWCLOCK

    Excel LAMBDA Spotlight: SHOWCLOCK

    In this article I'll look at a LAMBDA function called SHOWCLOCK. I don't know the full name of the person who wrote…

    8 条评论
  • Excel LAMBDA Spotlight: UNPIVOT

    Excel LAMBDA Spotlight: UNPIVOT

    CREDITS In this article I'll look at a LAMBDA function called UNPIVOT. This one was written by Erik Oehm, creator of…

    18 条评论
  • Excel LAMBDA Spotlight: CROSSJOIN

    Excel LAMBDA Spotlight: CROSSJOIN

    CREDITS The CROSSJOIN LAMBDA function is the product of Erik Oehm, creator of the amazing Excel Robot. If you haven't…

    10 条评论
  • LAMBDA spotlight: Text.DropSliceBetween

    LAMBDA spotlight: Text.DropSliceBetween

    CREDITS This article will look at a single LAMBDA function called Text.DropSliceBetween.

    7 条评论
  • LAMBDA spotlight: A.XMATCH.ROWS

    LAMBDA spotlight: A.XMATCH.ROWS

    CREDITS Today I'm going to try something a little different. This article will throw a spotlight on a function that I…

    19 条评论
  • SQL integer division is your new favorite

    SQL integer division is your new favorite

    I saw a post today which got me thinking about learning SQL and the things that tripped me up in the early days. The…

    7 条评论

社区洞察

其他会员也浏览了