Intermediate SQL takeaways

Intermediate SQL takeaways

I just finished the Intermediate SQL course on DataCamp, here are my takeaways.


Click here if you want the full version of my notes.


Case statements

  • Used for returning a value on a specified condition
  • goes through conditions and returns value as the given result if it is met. If not, the else result is returned.
  • If there is no ELSE, it gives NULL.
  • To filter NULL, place the entire CASE WHEN into a WHERE clause and end with IS NOT NULL
  • can be used with aggregate functions - COUNT, SUM, AVG, Rounded AVG
  • You can sum logical values by converting them to 1 and 0 using CASE WHEN so you can calculate percentages.

Case statement code:

case when code


Subqueries

  • in SELECT, FROM, WHERE, GROUP BY
  • Returns -> scalar, list, table


Why Subqueries?

  1. Compare groups to summarized values
  2. Reshape data
  3. Combining data that cannot be joined


Subqueries in FROM

1. restructuring and transform data

  • transform data from long → wide
  • prefilter data

2. calculate aggregates of aggregates

To remember

  • give each subquery an alias, and join them
  • Join subquery to a table in FROM
  • include a joining column in both tables (ON or USING)


Subqueries in SELECT

1. return single value

  • include aggregate values to compare individual values (you can't include aggregate values in ungrouped SQL query)

2. Mathematical calculation

  • deviation from average
  • ex: compare single average vs overall average.

To remember

  • need to return single value
  • make sure you have filters in the right place (include where clause in subquery and in main query)


Tips for subqueries

  • format your query (line up SELECT, FROM, WHERE and GROUP BY)
  • Annotate your queries with comments
  • indent your queries (subqueries)
  • make sure subqueries are necessary as they can slow down computing
  • Properly filter each subquery (include in all subquery and outer query)


Example of subquery in SELECT, FROM and WHERE from the course

example of subqueries


Correlated Queries

  • use values from outer query to generate results
  • re-run for every row generated in final dataset
  • used for advanced joining, filtering and evaluating.


Simple query vs correlated query


simple

  • extracting, structuring, and filtering
  • run independent of main query
  • Evaluated once

correlated

  • dependent of main query
  • Evaluated in loops (slows down query time)


Nested Queries

  • subquery inside another subquery
  • perform multiple layers of transformation
  • can be either simple or correlated.


Common Table Expression (CTE)

What

  • a temporary named table for easy use and easier to read code
  • An alternative to subqueries

Code

  • s1 and s2 are the table names
  • you put subqueries into CTE
No alt text provided for this image


Benefits

1. executed once

  • CTE are stored in memory
  • improves query performance

2. improves organization of queries

3. Allows referencing of other CTEs

4. Allows referencing itself (SELF JOIN)


Deciding Techniques

  1. JOINS
  2. Correlated Subqueries
  3. Multiple/nested Subqueries
  4. CTEs


JOINS

Pros

  • Combines 2+ tables
  • Simple operations / aggregations

Cons

  • can't join two columns to a table in a single column in another at a time


Correlated subquery

Pros

  • circumvents multiple complex joins
  • reduce the length of your query
  • Solves limit of joins

Cons

  • High processing time
  • Who does each employee report to in a company


Multiple/nested subquery

Pros

  • useful when it requires multi-step transformation
  • improves accuracy and reproducibility
  • processed first before main query

Cons

  • Sensitive to changes to data


CTE

Pros

  • organize subquery sequentially
  • declared first, improving query run time
  • Reference other CTEs

Cons

  • Takes up space
  • Cannot be nested


Which to use?

  1. depends on database/question
  2. Use technique that best allows you to use and reuse queries and generate clear and accurate results


Window Function

What is it

  • a class of function that performs calculations on a result set that has been already generated (window)

Use case

  • use window to perform aggregate calculations without having to group data using the GROUP clause
  • calculate information → running total, rankings, and moving averages

Why

  • to use aggregate functions, you need to group your results (every singly non-aggregate value)
  • This means you can't compare aggregate values to non-aggregate data
  • using window functions, we can perform calculations on results that has been generated
  • simpler syntax and faster processing time


IMPORTANT: it's processed after every part of the query except ORDER BY

  • uses info in result set rather than database


OVER()

  • used to create windows
  • Placed after the aggregate function and before the AS alias
  • Example:
No alt text provided for this image


PARTITION BY

what

  • calculate separate windows based on cols you want to divide your results
  • create a single column that calculates average of goals scored for each seasons
  • can partition 1 or more columns
  • can be used with aggregate functions, ranks, etc.

Example

{aggregate func} OVER(PARTITION BY column_you_want_to_divide_by ) AS name


Extract

  • extract specific month, day or year from date columns
  • ex: grab august from date
EXTRACT(month FROM date) = 08) 


RANK()

  • Creates ranking of your columns
  • include RANK() in the front
  • remember to include ORDER BY inside the OVER clause to specify the column you want to rank
  • default order is smallest → largest
  • add a DESC inside your agg function to reverse

Example:

RANK() OVER(ORDER BY {aggregate function} desc) AS col_name


Sliding windows

  • create running calculations between any two points in a window using functions
  • PRECEDING - before
  • FOLLOWING - after
  • CURRENT ROW
  • UNBOUNDED PRECEDING/FOLLOWING - no limit
  • You can calculate running counts, sums, averages, and other aggregate functions between any two points you specify in the data set.
No alt text provided for this image



Next course: PostgreSQL Summary Stats and Window Functions


Resources


Thank you.

View my certificate

要查看或添加评论,请登录

Benedict Neo的更多文章

  • Joining Data In SQL takeaways

    Joining Data In SQL takeaways

    I just finished Joining Data In SQL on DataCamp. Here are my takeaways.

  • Introduction to SQL takeaways

    Introduction to SQL takeaways

    I just finished the Introduction to SQL course on DataCamp. Here are my takeaways.

社区洞察

其他会员也浏览了