登录查看更多内容

Intermediate SQL takeaways

Benedict Neo

MSDS @ USF | Z by HP Global Data Science Ambassador

发布日期: 2021年5月15日

+ 关注

I just finished the Intermediate SQL course on DataCamp, here are my takeaways.

Click here if you want the full version of my notes.

Case statements

Used for returning a value on a specified condition
goes through conditions and returns value as the given result if it is met. If not, the else result is returned.
If there is no ELSE, it gives NULL.
To filter NULL, place the entire CASE WHEN into a WHERE clause and end with IS NOT NULL
can be used with aggregate functions - COUNT, SUM, AVG, Rounded AVG
You can sum logical values by converting them to 1 and 0 using CASE WHEN so you can calculate percentages.

Case statement code:

Subqueries

in SELECT, FROM, WHERE, GROUP BY
Returns -> scalar, list, table

Why Subqueries?

Compare groups to summarized values
Reshape data
Combining data that cannot be joined

Subqueries in FROM

1. restructuring and transform data

transform data from long → wide
prefilter data

2. calculate aggregates of aggregates

To remember

give each subquery an alias, and join them
Join subquery to a table in FROM
include a joining column in both tables (ON or USING)

Subqueries in SELECT

1. return single value

include aggregate values to compare individual values (you can't include aggregate values in ungrouped SQL query)

2. Mathematical calculation

deviation from average
ex: compare single average vs overall average.

To remember

need to return single value
make sure you have filters in the right place (include where clause in subquery and in main query)

Tips for subqueries

format your query (line up SELECT, FROM, WHERE and GROUP BY)
Annotate your queries with comments
indent your queries (subqueries)
make sure subqueries are necessary as they can slow down computing
Properly filter each subquery (include in all subquery and outer query)

Example of subquery in SELECT, FROM and WHERE from the course

Correlated Queries

use values from outer query to generate results
re-run for every row generated in final dataset
used for advanced joining, filtering and evaluating.

Simple query vs correlated query

simple

extracting, structuring, and filtering
run independent of main query
Evaluated once

correlated

dependent of main query
Evaluated in loops (slows down query time)

Nested Queries

subquery inside another subquery
perform multiple layers of transformation
can be either simple or correlated.

Common Table Expression (CTE)

What

a temporary named table for easy use and easier to read code
An alternative to subqueries

Code

s1 and s2 are the table names
you put subqueries into CTE

Benefits

1. executed once

CTE are stored in memory
improves query performance

2. improves organization of queries

3. Allows referencing of other CTEs

4. Allows referencing itself (SELF JOIN)

Deciding Techniques

JOINS
Correlated Subqueries
Multiple/nested Subqueries
CTEs

JOINS

Pros

Combines 2+ tables
Simple operations / aggregations

Cons

can't join two columns to a table in a single column in another at a time

Correlated subquery

Pros

circumvents multiple complex joins
reduce the length of your query
Solves limit of joins

Cons

High processing time
Who does each employee report to in a company

Multiple/nested subquery

Pros

useful when it requires multi-step transformation
improves accuracy and reproducibility
processed first before main query

Cons

Sensitive to changes to data

CTE

Pros

organize subquery sequentially
declared first, improving query run time
Reference other CTEs

Cons

Takes up space
Cannot be nested

Which to use?

depends on database/question
Use technique that best allows you to use and reuse queries and generate clear and accurate results

Window Function

What is it

a class of function that performs calculations on a result set that has been already generated (window)

Use case

use window to perform aggregate calculations without having to group data using the GROUP clause
calculate information → running total, rankings, and moving averages

Why

to use aggregate functions, you need to group your results (every singly non-aggregate value)
This means you can't compare aggregate values to non-aggregate data
using window functions, we can perform calculations on results that has been generated
simpler syntax and faster processing time

IMPORTANT: it's processed after every part of the query except ORDER BY

uses info in result set rather than database

OVER()

used to create windows
Placed after the aggregate function and before the AS alias
Example:

PARTITION BY

what

calculate separate windows based on cols you want to divide your results
create a single column that calculates average of goals scored for each seasons
can partition 1 or more columns
can be used with aggregate functions, ranks, etc.

Example

{aggregate func} OVER(PARTITION BY column_you_want_to_divide_by ) AS name

Extract

extract specific month, day or year from date columns
ex: grab august from date

EXTRACT(month FROM date) = 08)

RANK()

Creates ranking of your columns
include RANK() in the front
remember to include ORDER BY inside the OVER clause to specify the column you want to rank
default order is smallest → largest
add a DESC inside your agg function to reverse

Example:

RANK() OVER(ORDER BY {aggregate function} desc) AS col_name

Sliding windows

create running calculations between any two points in a window using functions
PRECEDING - before
FOLLOWING - after
CURRENT ROW
UNBOUNDED PRECEDING/FOLLOWING - no limit
You can calculate running counts, sums, averages, and other aggregate functions between any two points you specify in the data set.

Next course: PostgreSQL Summary Stats and Window Functions

Resources

Course notes with code
SQLZOO - an awesome series of practical exercises and quizzes for students of SQL
Stack Exchange Data Explorer - Write SQL code to query Stack Overflow

Thank you.

View my certificate

要查看或添加评论，请登录

Benedict Neo的更多文章

Joining Data In SQL takeaways

2021年5月14日

Joining Data In SQL takeaways

I just finished Joining Data In SQL on DataCamp. Here are my takeaways.
Introduction to SQL takeaways

2021年5月12日

Introduction to SQL takeaways

I just finished the Introduction to SQL course on DataCamp. Here are my takeaways.

Intermediate SQL takeaways

Benedict Neo

MSDS @ USF | Z by HP Global Data Science Ambassador

Case statements

Subqueries

Why Subqueries?

Subqueries in FROM

Subqueries in SELECT

Tips for subqueries

Correlated Queries

Simple query vs correlated query

Nested Queries

Common Table Expression (CTE)

Deciding Techniques

Which to use?

Window Function

What is it

Use case

Why

OVER()

PARTITION BY

Extract

RANK()

Sliding windows

Resources

Benedict Neo的更多文章

社区洞察

其他会员也浏览了

Endless SQL Possibilities

SQL QuickStart Guide

Optimizing SQL Queries for Performance

SQL Simplified: Key Concepts Every Beginner Must Know

SQL 101: Top Mistakes Beginners Should Avoid

It's The Assumptions That Get You

Dynamic SQL for Data Analysts: Tools, Techniques, and Tips

Understanding the Execution Cycle of an SQL Query: A Key to Optimizing Performance

Top 9 Best Practices When Writing SQL

5 things I wish I had learned early in my SQL career

Case statements

Subqueries

Why Subqueries?

Subqueries in FROM

Subqueries in SELECT

Tips for subqueries

Correlated Queries

Simple query vs correlated query

Nested Queries

Common Table Expression (CTE)

Deciding Techniques

Which to use?

Window Function

What is it

Use case

Why

OVER()

PARTITION BY

Extract

RANK()

Sliding windows

Resources

Benedict Neo的更多文章

Joining Data In SQL takeaways

Introduction to SQL takeaways

社区洞察

其他会员也浏览了

Endless SQL Possibilities

SQL QuickStart Guide

Optimizing SQL Queries for Performance

SQL Simplified: Key Concepts Every Beginner Must Know

SQL 101: Top Mistakes Beginners Should Avoid

It's The Assumptions That Get You

Dynamic SQL for Data Analysts: Tools, Techniques, and Tips

Understanding the Execution Cycle of an SQL Query: A Key to Optimizing Performance

Top 9 Best Practices When Writing SQL

5 things I wish I had learned early in my SQL career