SQL Refresher: Essential Queries for Data Analysts

SQL Refresher: Essential Queries for Data Analysts

WSDA News | December 31, 2024

As the year wraps up, there’s no better time to revisit SQL—one of the most indispensable tools for any data analyst. Whether you’re building retention models, analyzing user engagement, or preparing datasets for machine learning, SQL is the backbone of many data workflows.

Here’s a concise refresher on some of the most critical SQL techniques and queries I’ve used in my journey as a data analyst. These examples can save you time and elevate your data analysis game.


1. SQL for Retention and Cohort Analysis

Retention analysis is a cornerstone of understanding user engagement. Creating a consolidated table with user activity flags (e.g., R(D1), R(D7), R(D30)) simplifies this process, but setting it up can be time-intensive.

Example: Cohorted Retention Query

WITH UserCohorts AS (
    SELECT user_id, MIN(event_date) AS cohort_date
    FROM user_events
    GROUP BY user_id
),
RetentionMetrics AS (
    SELECT 
        uc.cohort_date,
        ue.event_date,
        COUNT(DISTINCT ue.user_id) AS retained_users
    FROM UserCohorts uc
    JOIN user_events ue ON uc.user_id = ue.user_id
    WHERE DATEDIFF(ue.event_date, uc.cohort_date) BETWEEN 0 AND 30
    GROUP BY uc.cohort_date, ue.event_date
)
SELECT cohort_date, 
       event_date, 
       retained_users
FROM RetentionMetrics;        

This query tracks user retention over a 30-day period, grouped by cohorts.


2. Ranking Top-Used Product Features

Understanding which product features are most used requires well-structured activity data. If your raw event data isn’t already mapped to features, you’ll need to create relationships between event types and feature categories.

Example: Ranking Features by Usage

SELECT feature_name, 
       COUNT(*) AS usage_count
FROM user_actions
JOIN features ON user_actions.action_type = features.action_type
GROUP BY feature_name
ORDER BY usage_count DESC
LIMIT 10;        

This query identifies the top 10 most used features by mapping user actions to feature categories.


3. Frequency of Usage Analysis

For products with episodic usage patterns (e.g., Airbnb or LinkedIn), analyzing frequency instead of DAU/MAU ratios provides more meaningful insights.

Example: Usage Frequency Query

SELECT user_id, 
       COUNT(DISTINCT DATE(event_date)) AS active_days, 
       COUNT(*) AS total_events
FROM user_events
GROUP BY user_id
HAVING active_days BETWEEN 5 AND 10;        

This query identifies users with moderate activity levels, helping you analyze different user personas.


4. Analyzing Weekday Product Usage

Identifying trends in product usage across weekdays can help optimize resource allocation and feature updates.

Example: Weekday Usage Analysis

SELECT DAYNAME(event_date) AS weekday, 
       COUNT(*) AS usage_count
FROM user_events
GROUP BY weekday
ORDER BY FIELD(weekday, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');        

This query aggregates user activity by weekday, providing insights into daily usage patterns.


5. Average Logins Per User Per Day

Tracking average logins helps measure user engagement and identify anomalies.

Example: Average Logins Query

SELECT user_id, 
       AVG(login_count) AS avg_logins_per_day
FROM (
    SELECT user_id, 
           event_date, 
           COUNT(*) AS login_count
    FROM user_events
    WHERE event_type = 'login'
    GROUP BY user_id, event_date
) daily_logins
GROUP BY user_id;        

This query calculates the daily average logins for each user.


6. Selecting a Random Set of Users for Analysis

For certain machine learning or analytics tasks, you may need a random sample of users.

Example: Random User Selection Query

SELECT user_id
FROM users
ORDER BY RAND()
LIMIT 100;        

This query selects 100 random users for sampling purposes.


7. Flattening JSON Data

JSON data is common in modern datasets but requires transformation for SQL analysis.

Example: Flattening JSON

SELECT 
    json_extract_scalar(json_column, '$.key1') AS value1, 
    json_extract_scalar(json_column, '$.key2') AS value2
FROM json_table;        

This query extracts specific fields from a JSON column into separate columns.


8. Joining Tables Without a Foreign Key

Joining tables without a predefined relationship requires creativity, often using approximate matches or calculated relationships.

Example: Joining Unrelated Tables

SELECT a.user_id, 
       b.session_id
FROM table_a a
JOIN table_b b ON a.timestamp BETWEEN b.start_time AND b.end_time;        

This query joins two tables based on overlapping time ranges.


9. Modeling Tables for Subscription Reporting

Subscription reporting often involves creating a clean schema for recurring revenue metrics.

Example: Subscription Modeling Query

SELECT customer_id, 
       subscription_start_date, 
       subscription_end_date, 
       DATEDIFF(subscription_end_date, subscription_start_date) AS subscription_duration
FROM subscriptions;        

This query calculates subscription durations, which can be used for retention or revenue forecasting.


Conclusion

SQL is an incredibly versatile tool for data analysis, and mastering these practical queries will save time while delivering insights. Whether you’re analyzing retention, ranking features, or preparing data for machine learning, these examples provide a strong foundation for your analytics journey.

Data No Doubt! Check out WSDALearning.ai and start learning Data Analytics and Data Science Today!

要查看或添加评论,请登录

Walter Shields的更多文章

社区洞察

其他会员也浏览了