SQL Tips and Tricks
Guilherme Lisb?a
Senior Data Engineer | Google Cloud Certified | Python | GCP | Bigquery | Dataflow
Here we'll propose a advantage use cases to improve your query performance and make your query less complex.
Common Table Expressions ( CTEs )
CTEs allow us to break down complex queries into smaller ones, we can reuse those query in multiples times within a single query.
We can stop use concatenate queries instead CTEs
SELECT *
FROM
(
SELECT *
FROM users
WHERE sex = 'F'
) men
INNER JOIN
(
SELECT *
FROM films
WHERE total>100
) catalog
ON men.id = catalog.user_id*
We can replace the query above to
WITH user AS
SELECT *
FROM users
WHERE sex = 'M'
),
catalog AS (
SELECT *
FROM films
WHERE total>100
)
SELECT *
FROM user
INNER JOIN catalog ON user.id = catalog.user_id
Window Functions ( WF )
Window functions are a useful technique that allows for executing analytical functions and aggregations on specific data windows. With window functions, we can calculate metrics such as moving averages or growth rates in a time series. Additionally, window functions also allow for specifying the sorting order of data in a query and defining the window based on values in other columns. This makes it easier to analyze data and gain valuable insights.
WF allows you to make calculations across rows without needing self-joins, that is a good alternative to avoid self-join and multiples querys.
SELECT customer_id, amount,
SUM(amount)
OVER (
PARTITION BY customer_id ORDER BY films_date
) AS amount_collected
FROM films
In the above example we some all amount e put the result in a column in that query we don't need to make a group.
Pivot Tables
That approach let us to turno rows into columns, it's help us in some situations
Imagine you have that table:
领英推荐
+--------+----------+-------
| region | year | sales |
+--------+----------+-------+
| North | 2010 | 100 |
| North | 2011 | 200 |
| South | 2010 | 150 |
| South | 2011 | 75 |
| East | 2010 | 50 |
+--------+----------+-------++
And we can make a analises to know how the sales how in each year, we can pivot that table into
+--------+----------+----------
| region | 2010 | 2011 |
+--------+----------+----------+
| East | 50 | NULL |
| North | 100 | 200 |
| South | 150 | 75 |
+--------+----------+----------++
A example of that query could be
WITH pivot_data AS
SELECT region, year, sales
FROM sales_data
)
SELECT *
FROM pivot_data
PIVOT (SUM(sales) FOR year IN ([2010], [2011])) AS pivot_table(
Wild Table
Wild Table is a powerful technique that allows for querying multiple tables without the need to join or merge them. With this technique, we can access multiple tables without creating a temporary table to store the results. Instead, Wild Table uses wildcard syntax, which allows BigQuery to filter tables based on name patterns. For example, if we have several sales tables with names like "sales_2018", "sales_2019", "sales_2020", we can query all of these tables at once with a single query by using Wild Table and specifying the pattern "sales_*".
Federated Tables
Federated tables are an advanced SQL technique that allows for querying data from external sources such as Google Cloud Storage or Google Drive without the need to import the data into BigQuery. With federated tables, we can access data from external sources and incorporate them into our SQL queries. This can be useful for accessing historical data or data that is not stored in BigQuery. By using federated tables, we can save time and resources on data transfer and import.
Materialized Views
Materialized views are an advanced SQL technique that allows for precomputing and storing the results of a query. With materialized views, we can define a query that generates a result set, which can then be stored as a table. This can be useful for optimizing query performance by avoiding the need to repeatedly execute a complex query.
Unnest + Generate_array:
Unnest and Generate_array are advanced SQL techniques that allow for generating arrays and unnesting them into individual rows. With these techniques, we can transform data in a way that makes it easier to analyze and gain valuable insights. This can be useful for working with complex data structures such as arrays or JSON objects.
Those are some Technics that will help you to improve your queries.