登录查看更多内容

Debunking Window Functions

Parijat Bose

Data | Cloud | GenAI

发布日期: 2024年2月7日

A retail company have a employee_sales table which logs the sales of every employee in every city he is working for every department and for each month.

The sales head wants to get a couple of reports:

Total sales for each department in the month of January
Performance of the best employee of each department for the month of January.

For the first report, one just has to group each department and then calculate the sum of sales_amount where the month is January.

SELECT dept, SUM(sales_amount) AS total_sales
FROM employee_sales
WHERE month = 'January'
GROUP BY dept;

But the second report is a bit complex. The boss wants to know per department who has made the highest sales.

In that case:

领英推荐

When a buyer gets 70% through the buyer's journey…

Timothy "Tim" Hughes 提姆·休斯 L.ISP 1 年前

Beyond Price Wars: Innovative Approaches to Maintain…

Sameh Mesmar MBA Eng 9 个月前

Glimpse Into the Future with Sales Forecasting

efficy 2 年前

We have to group each department into an imaginary WINDOW.
Then sort each window w.r.t. the sales_amount in descending order.
From this result, we have to get the row that is in the first position. That's it!

In the above table, we can see what happens to the original table when we window(group) it by dept and the sort the resulting window in descending order by sales_amount. In this case, we use ROW_NUMBER() window function to number each resulting row starting from 1 and the row with highest rank is the best performer. Yay!

WITH january_sales AS (
SELECT emp_id, dept, sales_amount,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY SUM(sales) DESC) as rank
FROM employee_sales
WHERE month = 'January')
SELECT emp_id, dept, sales_amount
FROM january_sales
WHERE rank = 1;

While for generating the first report, aggregate function is applied on the full table, whereas, for the second report, firs the table is grouped into windows and then the highest from each of the groups is selected.

There are several types of window functions in SQL, typically falling into the following categories:

Ranking Functions: These functions return a ranking value for each row in a window. They can be used to rank rows, find the top N rows, etc. Examples include RANK(), DENSE_RANK(), and ROW_NUMBER().
Aggregate Functions: These functions perform calculations across a set of rows and return a single output row. Examples include SUM(), AVG(), MIN(), MAX(), and COUNT().
Analytic Functions: These are similar to aggregate functions, but they return a group of rows that can be further analyzed. Examples include FIRST_VALUE(), LAST_VALUE(), LAG(), and LEAD().
Cumulative Distribution and Percent Rank Functions: These functions provide a sort of ranking, where the rank is represented as a percentage. Examples include PERCENT_RANK(), CUME_DIST(), NTILE().

要查看或添加评论，请登录

Parijat Bose的更多文章

The Vital Connection Between Data Lineage and Data Quality

2024年8月12日

The Vital Connection Between Data Lineage and Data Quality

In the dynamic world of hospitality, data plays a vital role in driving operational efficiency, enhancing guest…
Quality Assurance vs. Quality Control in Data Management

2024年8月7日

Quality Assurance vs. Quality Control in Data Management

Having had the opportunity to work in diverse industries, including credit cards, life sciences, and hospitality, I've…
Why Make the Switch: Migrating from Apache Hive to Apache Iceberg

2024年7月26日

Why Make the Switch: Migrating from Apache Hive to Apache Iceberg

As data lakes continue to grow in size and complexity, organizations face new challenges in managing and querying their…

2 条评论
Managing Design Trade Offs!

2024年2月12日

Managing Design Trade Offs!

Problem statement: Design a data warehousing job where the job has to load the execution date partition of a target…
A Comparative Analysis of Avro, Parquet, and ORC: Understanding the Differences

2023年5月16日

A Comparative Analysis of Avro, Parquet, and ORC: Understanding the Differences

Data storage formats play a crucial role in big data processing and analytics. Avro, Parquet, and ORC (Optimized Row…

1 条评论
Top 25 File Types used in Data Engineering

2023年5月11日

Top 25 File Types used in Data Engineering

In Data Engineering, these are the top 25 file types used to store and transfer data.: CSV (Comma-Separated Values) -…
GraphQL - Alternative to REST API

2023年4月27日

GraphQL - Alternative to REST API

GraphQL is an API query language that is built on a simple and flexible type system. It is designed to be independent…
Heard of Great Expectations DQ framework?

2023年4月26日

Heard of Great Expectations DQ framework?

Great Expectations is an open-source Python library for data quality testing, monitoring, and documentation. It…
Presto: "I think I should now make way for Trino!"

2023年4月24日

Presto: "I think I should now make way for Trino!"

In 2019, the developers of PrestoSQL announced that they would be forking the project to create a new version of the…
Presto - Reading Big Data at lightning speed!

2023年4月22日

Presto - Reading Big Data at lightning speed!

When it comes to big data analytics, processing large datasets can be a significant challenge. One of the key…

See all articles

Debunking Window Functions

Parijat Bose

Data | Cloud | GenAI

领英推荐

Parijat Bose的更多文章

社区洞察

其他会员也浏览了

Why Do 40% Of The Company's Monthly Sales Happen In The Last Few Days Of The Month?

How to Increase Retail Sales: Top 25 Expert Tips

VS#10: Have We Given Buyers Too Much Power?

Sales News for Revenue Leaders Everywhere

Are Discounts Hurting Your Bottom Line? The Hidden Costs of Sales

How to Identify and Handle Window Shopper Clients in IT: A Detailed Guide for New Sales Managers and Founders

4 Ways to Reduce Buyer Anxiety and Boost Sales

How Data in Sales Can Help You Win More Deals

Why New Buyers See You First, And What To Do About It

Route to Market

领英推荐

Parijat Bose的更多文章

The Vital Connection Between Data Lineage and Data Quality

Quality Assurance vs. Quality Control in Data Management

Why Make the Switch: Migrating from Apache Hive to Apache Iceberg

Managing Design Trade Offs!

A Comparative Analysis of Avro, Parquet, and ORC: Understanding the Differences

Top 25 File Types used in Data Engineering

GraphQL - Alternative to REST API

Heard of Great Expectations DQ framework?

Presto: "I think I should now make way for Trino!"

Presto - Reading Big Data at lightning speed!

社区洞察

其他会员也浏览了

Why Do 40% Of The Company's Monthly Sales Happen In The Last Few Days Of The Month?

How to Increase Retail Sales: Top 25 Expert Tips

VS#10: Have We Given Buyers Too Much Power?

Sales News for Revenue Leaders Everywhere

Are Discounts Hurting Your Bottom Line? The Hidden Costs of Sales

How to Identify and Handle Window Shopper Clients in IT: A Detailed Guide for New Sales Managers and Founders

4 Ways to Reduce Buyer Anxiety and Boost Sales

How Data in Sales Can Help You Win More Deals

Why New Buyers See You First, And What To Do About It

Route to Market