Instagram User Analytics: Unveiling Insights with SQL Fundamentals
Pintu Kumar Kushwaha
?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |
Description:
Imagine you're a data analyst working with the product team at Instagram. Your role involves analyzing user interactions and engagement with the Instagram app to provide valuable insights that can help the business grow.
User analysis involves tracking how users engage with a digital product, such as a software application or a mobile app. The insights derived from this analysis can be used by various teams within the business. For example, the marketing team might use these insights to launch a new campaign, the product team might use them to decide on new features to build, and the development team might use them to improve the overall user experience.
In this project, you'll be using SQL and MySQL Workbench as your tool to analyze Instagram user data and answer questions posed by the management team. Your insights will help the product manager and the rest of the team make informed decisions about the future direction of the Instagram app.
Remember, the goal of this project is to use your SQL skills to extract meaningful insights from the data. Your findings could potentially influence the future development of one of the world's most popular social media platforms.
TECH-STACK USED?
Approach
The approach towards this this project is to use SQL queries to analyze the database provided.?
CREATE DATABASE ig_clone;
B. Creating tables
User table
/*Users*/
CREATE TABLE users(
id INT AUTO_INCREMENT UNIQUE PRIMARY KEY,
username VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
This SQL code creates a table named users to store information about users. Here's a breakdown of each part of the code:
In summary, this SQL code creates a users table to store user information. Each user has a unique ID, a username, and a timestamp indicating when they were created. This table structure is common for managing user data in a database system.
Photos table
/*Photos*/
CREATE TABLE photos(
id INT AUTO_INCREMENT PRIMARY KEY,
image_url VARCHAR(355) NOT NULL,
user_id INT NOT NULL,
created_dat TIMESTAMP DEFAULT NOW(),
FOREIGN KEY(user_id) REFERENCES users(id)
);
This SQL command creates a table named photos to store information about photos uploaded by users. Here's a breakdown of the different parts of the code:
In summary, the SQL code creates a photos table to store information about photos. Each photo has a unique ID, an image URL, the ID of the user who uploaded it, and a timestamp indicating when it was uploaded. The user_id column establishes a relationship with the users table through a foreign key constraint. This table structure is commonly used for managing photos and their associated data in a database system.
Comments table
/*Comments*/
CREATE TABLE comments(
id INT AUTO_INCREMENT PRIMARY KEY,
comment_text VARCHAR(255) NOT NULL,
user_id INT NOT NULL,
photo_id INT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY(user_id) REFERENCES users(id),
FOREIGN KEY(photo_id) REFERENCES photos(id)
);
This SQL code creates a table named comments to store information about comments made by users on photos. Here's a breakdown of the different parts of the code:
In summary, the SQL code creates a comments table to store information about comments made by users on photos. Each comment has a unique ID, the text of the comment, the ID of the user who made it, the ID of the related photo, and a timestamp indicating when it was created. The foreign keys establish relationships with the users and photos tables. This structure is commonly used for managing user comments in a database system.
Like table
/*Likes*
CREATE TABLE likes(
user_id INT NOT NULL,
photo_id INT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY(user_id) REFERENCES users(id),
FOREIGN KEY(photo_id) REFERENCES photos(id),
PRIMARY KEY(user_id,photo_id)
);
This SQL code creates a table named likes to store information about likes that users give to photos. Here's an explanation of each part of the code:
In summary, the SQL code creates a likes table to store information about likes. Each like is associated with a user and a photo, along with the timestamp of when the like was given. The foreign keys link the likes table to the users and photos tables, and the composite primary key ensures the uniqueness of like relationships. This structure is commonly used for tracking likes or similar interactions in a database system.
Follows table
/*follows*/
CREATE TABLE follows(
follower_id INT NOT NULL,
followee_id INT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (follower_id) REFERENCES users(id),
FOREIGN KEY (followee_id) REFERENCES users(id),
PRIMARY KEY(follower_id,followee_id)
);
This SQL code snippet defines a table named follows that represents a relationship between users who follow each other on a platform. Here's a breakdown of what each part of the code does:
In summary, this SQL code creates a table that stores information about users following each other. The table captures the IDs of the follower and the followee, along with the timestamp of the follow action. Foreign keys link the user IDs to the users table, and a composite primary key ensures uniqueness of the follow relationships. This is a common way to model social network-like interactions in a relational database.
Tags table
/*Tags*/
CREATE TABLE tags(
id INTEGER AUTO_INCREMENT PRIMARY KEY,
tag_name VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
This SQL code creates a table named tags to store information about tags associated with photos or other entities. Here's an explanation of each part of the code:
In summary, the SQL code creates a tags table to store information about tags. Each tag has a unique ID, a tag name, and a timestamp indicating when it was created. The tag_name column has a UNIQUE constraint to ensure uniqueness, preventing multiple tags with the same name. This table structure is commonly used to manage tags associated with various entities, such as photos, posts, or articles.
Junction table: photo_tags table
/*junction table: Photos - Tags*/
CREATE TABLE photo_tags(
photo_id INT NOT NULL,
tag_id INT NOT NULL,
FOREIGN KEY(photo_id) REFERENCES photos(id),
FOREIGN KEY(tag_id) REFERENCES tags(id),
PRIMARY KEY(photo_id,tag_id)
);
This SQL code creates a junction table named photo_tags to establish a many-to-many relationship between photos and tags. This junction table is used to associate tags with photos. Here's an explanation of each part of the code:
领英推荐
In summary, the SQL code creates a photo_tags junction table to manage the many-to-many relationship between photos and tags. Each row in this table indicates that a specific photo is associated with a specific tag. The foreign keys connect the junction table to the photos and tags tables, and the composite primary key ensures the uniqueness of the associations. This type of junction table is commonly used to handle many-to-many relationships in a relational database.
Inserting Values into tables
Now we will insert values into all tables.
-- CHECKING TABLES
select * FROM users;
select * FROM photos;
select * FROM tags;
select * FROM photo_tags;
select * FROM likes;
select * FROM follows;
select * FROM comments;
SQL Tasks :
A) Marketing Analysis:
Your Task: Identify the five oldest users on Instagram from the provided database.
select * from users
order by created_at
limit 5;?
The SQL query you provided retrieves rows from the users table, orders them by the created_at column in ascending order (earliest to latest), and limits the result to the first 5 rows.This query will return the earliest created users (based on the created_at timestamp) and show the details of the first 5 users in ascending order of their creation.
2. Inactive User Engagement:?The team wants to encourage inactive users to start posting by sending them promotional emails.
Your Task: Identify users who have never posted a single photo on Instagram.
-- Your Task: Identify users who have never posted a single photo on Instagram
select * from users
where id not in
(select user_id from photos);
--Another Method (Using left join)
select u.id, u.username from? users u
left join photos p
on p.user_id = u.id
where p.user_id is null;
3. Contest Winner Declaration:?The team has organized a contest where the user with the most likes on a single photo wins.
Your Task: Determine the winner of the contest and provide their details to the team.
-- 1. Identify the Photo with the Most Likes:
SELECT?
? ? photo_id
FROM
? ? likes
GROUP BY photo_id
ORDER BY COUNT(user_id) DESC
LIMIT 1;
--2. Retrieve the Winner's Details:
with MostLikedPhoto as (SELECT?
? ? photo_id, COUNT(user_id) as total_likes
FROM
? ? likes
GROUP BY photo_id
ORDER BY COUNT(user_id) DESC
LIMIT 1)
select u.username, u.id, p.id as photo_id, MostLikedPhoto.total_likes from MostLikedPhoto
join photos p on MostLikedPhoto.photo_id = p.id
join users u on p.user_id = u.id;
In the first part, you're querying the likes table to identify the photo that has received the most likes. The steps are as follows:
In the second part, we are building upon the previous query to retrieve the details of the user who posted the most liked photo. Here's how it works:
Overall, this combination of SQL queries helps you identify the photo with the most likes, and then retrieve the associated user's details along with the photo's details.
4. Hashtag Research:?A partner brand wants to know the most popular hashtags to use in their posts to reach the most people.
Your Task: Identify and suggest the top five most commonly used hashtags on the platform.
/*Hashtag Research: A partner brand wants to know the most popular hashtags to use in their posts to reach the most people
Your Task: Identify and suggest the top five most commonly used hashtags on the platform.
*/
select * from tags;
with top_tags as
(select tag_id from photo_tags?
group by tag_id
order by count(tag_id) desc?
limit 5)
select t.tag_name from top_tags
join tags t on top_tags.tag_id = t.id;
Here, we're using a common table expression (CTE) named top_tags to identify the top five most commonly used hashtags. The steps are as follows:
In the final part of the query, we're joining the top_tags CTE with the tags table to retrieve the names of the top tags. This allows us to present the actual hashtag names that are the most commonly used on the platform.
In summary, our query uses a CTE to identify the top five hashtags and then joins that information with the tags table to retrieve the names of those hashtags. This gives a clear and actionable suggestion for the partner brand looking to reach a wider audience through popular hashtags.
5. Ad Campaign Launch:?The team wants to know the best day of the week to launch ads.
Your Task: Determine the day of the week when most users register on Instagram. Provide insights on when to schedule an ad campaign.
/*
Ad Campaign Launch: The team wants to know the best day of the week to launch ads.
Your Task: Determine the day of the week when most users register on Instagram.?
Provide insights on when to schedule an ad campaign.
*/
select dayname(created_at) as days_of_week,
? ?count(*) as num_of_users_resisters
from users
group by dayname(created_at)
order by num_of_users_resisters desc;?
In this part of the query, we're gathering data on user registrations and organizing it by day of the week. The steps are as follows:
In essence, this query provides a breakdown of the number of users who registered on each day of the week.
The insights gained from this query will help us determine the best day of the week to launch ad campaigns, as we'll be able to identify the day when most users tend to register on Instagram. This information can guide your ad campaign scheduling strategy for maximum impact.
B) Investor Metrics:
Your Task: Calculate the average number of posts per user on Instagram. Also, provide the total number of photos on Instagram divided by the total number of users.
B) Investor Metrics
User Engagement: Investors want to know if users are still active and posting on Instagram or if they are making fewer posts.
Your Task: Calculate the average number of posts per user on Instagram. Also, provide the?
total number of photos on Instagram divided by the total number of users.
*/
-- total number of users on instagram
SELECT COUNT(DISTINCT id) AS total_users_on_instagram
FROM users;
-- total number of photos on instagram
SELECT COUNT(*) AS total_photos_on_instagram
FROM photos;
-- the?total number of photos on Instagram divided by the total number of users
SELECT (SELECT Count(*)?
? ? ? ? FROM? ?photos) / (SELECT Count(*)?
? ? ? ? ? ? ? ? ? ? ? ? ? FROM? ?users) AS avg;?
? ? ? ? ? ? ? ? ? ? ? ? ??
--? Calculate the average number of posts per user on Instagram
-- post count by user
select user_id, count(*)? as posts_count from photos
group by user_id
order by posts_count desc;
-- average_post per user
SELECT AVG(posts_count) as avg_posts_per_user
FROM (
select user_id, count(*)? as posts_count from photos
group by user_id
order by posts_count desc) as user_posts;
2. Bots & Fake Accounts:?Investors want to know if the platform is crowded with fake and dummy accounts.
Your Task: Identify users (potential bots) who have liked every single photo on the site, as this is not typically possible for a normal user.
/
Bots & Fake Accounts: Investors want to know if the platform is crowded with fake and dummy accounts.
Your Task: Identify users (potential bots) who have liked every single photo on the site,?
as this is not typically possible for a normal user.
*/
-- total number of photos?
select count(*) from photos;
-- Bots & Fake Accounts:?
select username, count(*) as num_likes
from users u?
join likes l
on u.id = l.user_id
group by l.user_id
having num_likes = (select count(*) from photos)*
Results:
From this project, I got an idea about how as a business
or data analyst we work on real-time data to take any
data-driven decision.
One thing I infer about this project is the dataset
provided was very limited and small in respect of Rows
and columns, But still, it was a very good experience
working on such kind of project.
It helped me a lot to understand the analysis process
well, and to provide insights for the best decision
possible.
MSc Applied Economics with Banking & Financial Markets | Data-Driven Decision Making | Python and R for Data Analysis
1 年Hey Pintu Kumar Kushwaha this is a great project! I am planning to mine Instagram data analysis project too ??