Instagram User Analytics: Unveiling Insights with SQL Fundamentals

Instagram User Analytics: Unveiling Insights with SQL Fundamentals

Description:

Imagine you're a data analyst working with the product team at Instagram. Your role involves analyzing user interactions and engagement with the Instagram app to provide valuable insights that can help the business grow.

User analysis involves tracking how users engage with a digital product, such as a software application or a mobile app. The insights derived from this analysis can be used by various teams within the business. For example, the marketing team might use these insights to launch a new campaign, the product team might use them to decide on new features to build, and the development team might use them to improve the overall user experience.

In this project, you'll be using SQL and MySQL Workbench as your tool to analyze Instagram user data and answer questions posed by the management team. Your insights will help the product manager and the rest of the team make informed decisions about the future direction of the Instagram app.

Remember, the goal of this project is to use your SQL skills to extract meaningful insights from the data. Your findings could potentially influence the future development of one of the world's most popular social media platforms.



TECH-STACK USED?

  • To complete this project we have use software? MySQL workbench 8.0 CE. It is used to creating database and writing SQL queries. We also use notepad for saving SQL query and Microsoft power point presentation to create PPT.


Approach

The approach towards this this project is to use SQL queries to analyze the database provided.?

A. CREATE DATABASE

CREATE DATABASE ig_clone;        

B. Creating tables

User table

/*Users*/

CREATE TABLE users(

	id INT AUTO_INCREMENT UNIQUE PRIMARY KEY,

	username VARCHAR(255) NOT NULL,

	created_at TIMESTAMP DEFAULT NOW()

);        

This SQL code creates a table named users to store information about users. Here's a breakdown of each part of the code:

  1. Table Name: users

  • This is the name of the table you're creating to store user-related information.

  1. Columns:

  • id INT AUTO_INCREMENT UNIQUE PRIMARY KEY: This column is used as the primary key of the table. It's an integer (INT) data type and is set to auto-increment, meaning each new user entry will automatically get a unique ID value. The combination of UNIQUE and PRIMARY KEY ensures that each ID is unique and serves as the main identifier for each user.
  • username VARCHAR(255) NOT NULL: This column stores the username of the user. It's a variable-length character (VARCHAR) data type with a maximum length of 255 characters. The NOT NULL constraint ensures that a username must be provided for every user.
  • created_at TIMESTAMP DEFAULT NOW(): This column records the timestamp when the user was created. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

In summary, this SQL code creates a users table to store user information. Each user has a unique ID, a username, and a timestamp indicating when they were created. This table structure is common for managing user data in a database system.

Photos table

/*Photos*/

CREATE TABLE photos(

	id INT AUTO_INCREMENT PRIMARY KEY,

	image_url VARCHAR(355) NOT NULL,

	user_id INT NOT NULL,

	created_dat TIMESTAMP DEFAULT NOW(),

	FOREIGN KEY(user_id) REFERENCES users(id)

);        

This SQL command creates a table named photos to store information about photos uploaded by users. Here's a breakdown of the different parts of the code:

  1. Table Name: photos

  • This is the name of the table you're creating to store photo-related information.

  1. Columns:

  • id INT AUTO_INCREMENT PRIMARY KEY: This column is used as the primary key of the table. It's an integer (INT) data type and is set to auto-increment, ensuring each new photo entry gets a unique ID value. This unique ID acts as the main identifier for each photo.
  • image_url VARCHAR(355) NOT NULL: This column stores the URL (web address) of the photo image. It's a variable-length character (VARCHAR) data type with a maximum length of 355 characters. The NOT NULL constraint ensures that an image URL must be provided for every photo.
  • user_id INT NOT NULL: This column stores the ID of the user who uploaded the photo. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • created_dat TIMESTAMP DEFAULT NOW(): This column records the timestamp when the photo was uploaded. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

  1. Foreign Key:

  • FOREIGN KEY(user_id) REFERENCES users(id): This establishes a connection between the user_id column in the photos table and the id column in the users table. It creates a link between photos and the users who uploaded them.

In summary, the SQL code creates a photos table to store information about photos. Each photo has a unique ID, an image URL, the ID of the user who uploaded it, and a timestamp indicating when it was uploaded. The user_id column establishes a relationship with the users table through a foreign key constraint. This table structure is commonly used for managing photos and their associated data in a database system.

Comments table

/*Comments*/

CREATE TABLE comments(

	id INT AUTO_INCREMENT PRIMARY KEY,

	comment_text VARCHAR(255) NOT NULL,

	user_id INT NOT NULL,

	photo_id INT NOT NULL,

	created_at TIMESTAMP DEFAULT NOW(),

	FOREIGN KEY(user_id) REFERENCES users(id),

	FOREIGN KEY(photo_id) REFERENCES photos(id)

);        

This SQL code creates a table named comments to store information about comments made by users on photos. Here's a breakdown of the different parts of the code:

  1. Table Name: comments

  • This is the name of the table you're creating to store comment-related information.

  1. Columns:

  • id INT AUTO_INCREMENT PRIMARY KEY: This column is used as the primary key of the table. It's an integer (INT) data type and is set to auto-increment, ensuring each new comment entry gets a unique ID value. This unique ID serves as the main identifier for each comment.
  • comment_text VARCHAR(255) NOT NULL: This column stores the text of the comment made by a user. It's a variable-length character (VARCHAR) data type with a maximum length of 255 characters. The NOT NULL constraint ensures that a comment text must be provided for every comment.
  • user_id INT NOT NULL: This column stores the ID of the user who made the comment. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • photo_id INT NOT NULL: This column stores the ID of the photo that the comment is related to. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • created_at TIMESTAMP DEFAULT NOW(): This column records the timestamp when the comment was made. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

  1. Foreign Keys:

  • FOREIGN KEY(user_id) REFERENCES users(id): This establishes a connection between the user_id column in the comments table and the id column in the users table. It links comments to the users who made them.
  • FOREIGN KEY(photo_id) REFERENCES photos(id): This creates a connection between the photo_id column in the comments table and the id column in the photos table. It associates comments with the photos they are posted on.

In summary, the SQL code creates a comments table to store information about comments made by users on photos. Each comment has a unique ID, the text of the comment, the ID of the user who made it, the ID of the related photo, and a timestamp indicating when it was created. The foreign keys establish relationships with the users and photos tables. This structure is commonly used for managing user comments in a database system.

Like table

/*Likes*

CREATE TABLE likes(

	user_id INT NOT NULL,

	photo_id INT NOT NULL,

	created_at TIMESTAMP DEFAULT NOW(),

	FOREIGN KEY(user_id) REFERENCES users(id),

	FOREIGN KEY(photo_id) REFERENCES photos(id),

	PRIMARY KEY(user_id,photo_id)

);        

This SQL code creates a table named likes to store information about likes that users give to photos. Here's an explanation of each part of the code:

  1. Table Name: likes

  • This is the name of the table you're creating to store information about likes.

  1. Columns:

  • user_id INT NOT NULL: This column stores the ID of the user who liked a photo. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • photo_id INT NOT NULL: This column stores the ID of the photo that was liked. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • created_at TIMESTAMP DEFAULT NOW(): This column records the timestamp when the like was given. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

  1. Foreign Keys:

  • FOREIGN KEY(user_id) REFERENCES users(id): This establishes a link between the user_id column in the likes table and the id column in the users table. It associates likes with the users who gave them.
  • FOREIGN KEY(photo_id) REFERENCES photos(id): This creates a relationship between the photo_id column in the likes table and the id column in the photos table. It connects likes with the photos that were liked.

  1. Primary Key:

  • PRIMARY KEY(user_id, photo_id): This designates a composite primary key consisting of both the user_id and photo_id columns. This combination ensures that each user can only like a particular photo once, avoiding duplicate likes.

In summary, the SQL code creates a likes table to store information about likes. Each like is associated with a user and a photo, along with the timestamp of when the like was given. The foreign keys link the likes table to the users and photos tables, and the composite primary key ensures the uniqueness of like relationships. This structure is commonly used for tracking likes or similar interactions in a database system.

Follows table

/*follows*/

CREATE TABLE follows(

	follower_id INT NOT NULL,

	followee_id INT NOT NULL,

	created_at TIMESTAMP DEFAULT NOW(),

	FOREIGN KEY (follower_id) REFERENCES users(id),

	FOREIGN KEY (followee_id) REFERENCES users(id),

	PRIMARY KEY(follower_id,followee_id)

);        

This SQL code snippet defines a table named follows that represents a relationship between users who follow each other on a platform. Here's a breakdown of what each part of the code does:

  1. Table Name: follows

  • This is the name of the table that you're creating to store information about user relationships.

  1. Columns:

  • follower_id INT NOT NULL: This column stores the ID of the user who is following someone. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • followee_id INT NOT NULL: This column stores the ID of the user who is being followed. Like the previous column, it's an integer (INT) data type and cannot be left empty (NOT NULL).
  • created_at TIMESTAMP DEFAULT NOW(): This column records the timestamp when the follow relationship was created. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

  1. Foreign Keys:

  • FOREIGN KEY (follower_id) REFERENCES users(id): This indicates that the follower_id column is a foreign key that refers to the id column in the users table. It establishes a connection between the follows table and the users table based on the follower's ID.
  • FOREIGN KEY (followee_id) REFERENCES users(id): Similar to the previous foreign key, this establishes a connection between the follows table and the users table based on the followee's ID.

  1. Primary Key:

  • PRIMARY KEY(follower_id, followee_id): This declares a composite primary key that consists of both the follower_id and followee_id columns. This combination ensures that each follower can only follow a followee once, preventing duplicate follow relationships.

In summary, this SQL code creates a table that stores information about users following each other. The table captures the IDs of the follower and the followee, along with the timestamp of the follow action. Foreign keys link the user IDs to the users table, and a composite primary key ensures uniqueness of the follow relationships. This is a common way to model social network-like interactions in a relational database.

Tags table

/*Tags*/

CREATE TABLE tags(

	id INTEGER AUTO_INCREMENT PRIMARY KEY,

	tag_name VARCHAR(255) UNIQUE NOT NULL,

	created_at TIMESTAMP DEFAULT NOW()

);        

This SQL code creates a table named tags to store information about tags associated with photos or other entities. Here's an explanation of each part of the code:

  1. Table Name: tags

  • This is the name of the table you're creating to store tag-related information.

  1. Columns:

  • id INTEGER AUTO_INCREMENT PRIMARY KEY: This column is used as the primary key of the table. It's an integer (INTEGER) data type and is set to auto-increment, ensuring each new tag entry gets a unique ID value. This unique ID serves as the main identifier for each tag.
  • tag_name VARCHAR(255) UNIQUE NOT NULL: This column stores the name of the tag. It's a variable-length character (VARCHAR) data type with a maximum length of 255 characters. The NOT NULL constraint ensures that a tag name must be provided for every tag. The UNIQUE constraint ensures that each tag name is unique, preventing duplicates.
  • created_at TIMESTAMP DEFAULT NOW(): This column records the timestamp when the tag was created. It uses the TIMESTAMP data type and has a default value of the current timestamp (NOW()).

In summary, the SQL code creates a tags table to store information about tags. Each tag has a unique ID, a tag name, and a timestamp indicating when it was created. The tag_name column has a UNIQUE constraint to ensure uniqueness, preventing multiple tags with the same name. This table structure is commonly used to manage tags associated with various entities, such as photos, posts, or articles.

Junction table: photo_tags table

/*junction table: Photos - Tags*/

CREATE TABLE photo_tags(

	photo_id INT NOT NULL,

	tag_id INT NOT NULL,

	FOREIGN KEY(photo_id) REFERENCES photos(id),

	FOREIGN KEY(tag_id) REFERENCES tags(id),

	PRIMARY KEY(photo_id,tag_id)

);        

This SQL code creates a junction table named photo_tags to establish a many-to-many relationship between photos and tags. This junction table is used to associate tags with photos. Here's an explanation of each part of the code:

  1. Table Name: photo_tags

  • This is the name of the junction table you're creating to manage the relationship between photos and tags.

  1. Columns:

  • photo_id INT NOT NULL: This column stores the ID of the photo that is associated with a tag. It's an integer (INT) data type and cannot be left empty (NOT NULL).
  • tag_id INT NOT NULL: This column stores the ID of the tag that is associated with a photo. It's an integer (INT) data type and cannot be left empty (NOT NULL).

  1. Foreign Keys:

  • FOREIGN KEY(photo_id) REFERENCES photos(id): This establishes a relationship between the photo_id column in the photo_tags table and the id column in the photos table. It links the junction table to the photos that have been tagged.
  • FOREIGN KEY(tag_id) REFERENCES tags(id): This creates a relationship between the tag_id column in the photo_tags table and the id column in the tags table. It links the junction table to the tags that have been assigned to photos.

  1. Primary Key:

  • PRIMARY KEY(photo_id, tag_id): This designates a composite primary key consisting of both the photo_id and tag_id columns. This combination ensures that each photo can have a tag assigned only once and prevents duplicate associations.

In summary, the SQL code creates a photo_tags junction table to manage the many-to-many relationship between photos and tags. Each row in this table indicates that a specific photo is associated with a specific tag. The foreign keys connect the junction table to the photos and tags tables, and the composite primary key ensures the uniqueness of the associations. This type of junction table is commonly used to handle many-to-many relationships in a relational database.

Inserting Values into tables

Now we will insert values into all tables.

-- CHECKING TABLES
select * FROM users;
select * FROM photos;
select * FROM tags;
select * FROM photo_tags;
select * FROM likes;
select * FROM follows;
select * FROM comments;        

SQL Tasks :

A) Marketing Analysis:

  1. Loyal User Reward:?The marketing team wants to reward the most loyal users, i.e., those who have been using the platform for the longest time.

Your Task: Identify the five oldest users on Instagram from the provided database.

select * from users
order by created_at
limit 5;?        

The SQL query you provided retrieves rows from the users table, orders them by the created_at column in ascending order (earliest to latest), and limits the result to the first 5 rows.This query will return the earliest created users (based on the created_at timestamp) and show the details of the first 5 users in ascending order of their creation.

No alt text provided for this image

2. Inactive User Engagement:?The team wants to encourage inactive users to start posting by sending them promotional emails.

Your Task: Identify users who have never posted a single photo on Instagram.

-- Your Task: Identify users who have never posted a single photo on Instagram
select * from users 
where id not in
                (select user_id from photos);

--Another Method (Using left join) 
select u.id, u.username from? users u
left join photos p
on p.user_id = u.id
where p.user_id is null;        
No alt text provided for this image

3. Contest Winner Declaration:?The team has organized a contest where the user with the most likes on a single photo wins.

Your Task: Determine the winner of the contest and provide their details to the team.

-- 1. Identify the Photo with the Most Likes:

SELECT?
? ? photo_id
FROM
? ? likes
GROUP BY photo_id
ORDER BY COUNT(user_id) DESC
LIMIT 1;


--2.  Retrieve the Winner's Details:

with MostLikedPhoto as (SELECT?
? ? photo_id, COUNT(user_id) as total_likes
FROM
? ? likes
GROUP BY photo_id
ORDER BY COUNT(user_id) DESC
LIMIT 1)
select u.username, u.id, p.id as photo_id, MostLikedPhoto.total_likes from MostLikedPhoto
join photos p on MostLikedPhoto.photo_id = p.id
join users u on p.user_id = u.id;

        
No alt text provided for this image

In the first part, you're querying the likes table to identify the photo that has received the most likes. The steps are as follows:

  • The SELECT statement retrieves the photo_id from the likes table.
  • The GROUP BY clause groups the data by photo_id, so you're essentially counting the likes for each photo.
  • The ORDER BY clause arranges the grouped results in descending order of the count of user_id (which represents the likes). So, the most liked photo will be at the top.
  • The LIMIT 1 clause ensures that you only get the top result, which is the photo with the most likes.

In the second part, we are building upon the previous query to retrieve the details of the user who posted the most liked photo. Here's how it works:

  • The WITH clause creates a common table expression (CTE) named MostLikedPhoto. This CTE uses the same logic as the first query to find the most liked photo but also calculates the total number of likes for that photo and stores it in the total_likes column.
  • In the main query, you're joining the MostLikedPhoto CTE with the photos and users tables. This allows you to fetch details like the username (u.username), user ID (u.id), photo ID (p.id), and the total number of likes (MostLikedPhoto.total_likes).
  • The JOIN clauses connect the tables based on the relevant IDs: user_id in the photos table, and photo_id in the likes and photos tables.

Overall, this combination of SQL queries helps you identify the photo with the most likes, and then retrieve the associated user's details along with the photo's details.


4. Hashtag Research:?A partner brand wants to know the most popular hashtags to use in their posts to reach the most people.

Your Task: Identify and suggest the top five most commonly used hashtags on the platform.


/*Hashtag Research: A partner brand wants to know the most popular hashtags to use in their posts to reach the most people
Your Task: Identify and suggest the top five most commonly used hashtags on the platform.
*/


select * from tags;
with top_tags as
(select tag_id from photo_tags?
group by tag_id
order by count(tag_id) desc?
limit 5)
select t.tag_name from top_tags
join tags t on top_tags.tag_id = t.id;        
No alt text provided for this image

Here, we're using a common table expression (CTE) named top_tags to identify the top five most commonly used hashtags. The steps are as follows:

  • we're querying the photo_tags table.
  • The GROUP BY clause groups the data by tag_id, effectively counting how many times each tag has been used.
  • The ORDER BY clause arranges the groups in descending order of the count of tag_id occurrences, which means the most commonly used tags will be at the top.
  • The LIMIT 5 clause ensures you only retrieve the top five tags.

In the final part of the query, we're joining the top_tags CTE with the tags table to retrieve the names of the top tags. This allows us to present the actual hashtag names that are the most commonly used on the platform.

In summary, our query uses a CTE to identify the top five hashtags and then joins that information with the tags table to retrieve the names of those hashtags. This gives a clear and actionable suggestion for the partner brand looking to reach a wider audience through popular hashtags.


No alt text provided for this image

5. Ad Campaign Launch:?The team wants to know the best day of the week to launch ads.

Your Task: Determine the day of the week when most users register on Instagram. Provide insights on when to schedule an ad campaign.


/*
Ad Campaign Launch: The team wants to know the best day of the week to launch ads.
Your Task: Determine the day of the week when most users register on Instagram.?
Provide insights on when to schedule an ad campaign.
*/

select dayname(created_at) as days_of_week,
	? ?count(*) as num_of_users_resisters
from users
group by dayname(created_at)
order by num_of_users_resisters desc;?        

In this part of the query, we're gathering data on user registrations and organizing it by day of the week. The steps are as follows:

  • The SELECT statement retrieves the day of the week using the dayname() function applied to the created_at column (assuming it's a timestamp or date column).
  • The count(*) function calculates the number of users registered on each day of the week.
  • The GROUP BY clause groups the data by the day of the week to aggregate the registration counts.
  • The ORDER BY clause arranges the results in descending order based on the count of registered users. This will help you identify the day with the highest number of registrations first.

In essence, this query provides a breakdown of the number of users who registered on each day of the week.

The insights gained from this query will help us determine the best day of the week to launch ad campaigns, as we'll be able to identify the day when most users tend to register on Instagram. This information can guide your ad campaign scheduling strategy for maximum impact.

No alt text provided for this image

B) Investor Metrics:

  1. User Engagement:?Investors want to know if users are still active and posting on Instagram or if they are making fewer posts.

Your Task: Calculate the average number of posts per user on Instagram. Also, provide the total number of photos on Instagram divided by the total number of users.

B) Investor Metrics


User Engagement: Investors want to know if users are still active and posting on Instagram or if they are making fewer posts.
Your Task: Calculate the average number of posts per user on Instagram. Also, provide the?
total number of photos on Instagram divided by the total number of users.


*/


-- total number of users on instagram
SELECT COUNT(DISTINCT id) AS total_users_on_instagram
FROM users;


-- total number of photos on instagram
SELECT COUNT(*) AS total_photos_on_instagram
FROM photos;


-- the?total number of photos on Instagram divided by the total number of users

SELECT (SELECT Count(*)?
? ? ? ? FROM? ?photos) / (SELECT Count(*)?
? ? ? ? ? ? ? ? ? ? ? ? ? FROM? ?users) AS avg;?
? ? ? ? ? ? ? ? ? ? ? ? ??
--? Calculate the average number of posts per user on Instagram


-- post count by user


select user_id, count(*)? as posts_count from photos
group by user_id
order by posts_count desc;


-- average_post per user


SELECT AVG(posts_count) as avg_posts_per_user
FROM (
select user_id, count(*)? as posts_count from photos
group by user_id
order by posts_count desc) as user_posts;        

2. Bots & Fake Accounts:?Investors want to know if the platform is crowded with fake and dummy accounts.

Your Task: Identify users (potential bots) who have liked every single photo on the site, as this is not typically possible for a normal user.


/
Bots & Fake Accounts: Investors want to know if the platform is crowded with fake and dummy accounts.
Your Task: Identify users (potential bots) who have liked every single photo on the site,?
as this is not typically possible for a normal user.
*/


-- total number of photos?
select count(*) from photos;


-- Bots & Fake Accounts:?
select username, count(*) as num_likes
from users u?
join likes l
on u.id = l.user_id
group by l.user_id
having num_likes = (select count(*) from photos)*        


No alt text provided for this image
Bots & Fake Accounts

Results:

From this project, I got an idea about how as a business

or data analyst we work on real-time data to take any

data-driven decision.


One thing I infer about this project is the dataset

provided was very limited and small in respect of Rows

and columns, But still, it was a very good experience

working on such kind of project.


It helped me a lot to understand the analysis process

well, and to provide insights for the best decision

possible.


  • The Instagram user analytics project provided insights on marketing, User engagements, Bots and Fake accounts. Now these insight can be used by Instagram product team to launch new campaigns, track users engagement and improve user experience.
  • Here are the insights I found in these project

  1. Top 5 Oldest Users of Instagram?
  2. Users who never posted photos on Instagram
  3. Most liked photo on Instagram
  4. Top 5 most commonly used has-tags on Instagram
  5. Total number of users on Instagram
  6. Total number of photos on Instagram
  7. Average number of photos per user?
  8. Bots and fake? accounts on Instagram



Naomi Pereira

MSc Applied Economics with Banking & Financial Markets | Data-Driven Decision Making | Python and R for Data Analysis

1 年

Hey Pintu Kumar Kushwaha this is a great project! I am planning to mine Instagram data analysis project too ??

回复

要查看或添加评论,请登录

Pintu Kumar Kushwaha的更多文章

  • Data Manipulation in SQL

    Data Manipulation in SQL

    Basic CASE statements What is your favorite team? The European Soccer Database contains data about 12,800 matches from…

  • SQL Window Function

    SQL Window Function

    Window functions in SQL are a type of analytical function that perform calculations across a set of rows that are…

    11 条评论
  • SQL JOINS (Inner, Left, Right and Full Joins)

    SQL JOINS (Inner, Left, Right and Full Joins)

    SQL Join statement is used to combine data or rows from two or more tables based on a common field between them…

    2 条评论
  • Database

    Database

    A database is an organized collection of structured information, or data, typically stored electronically in a computer…

    12 条评论
  • SQL Subquery

    SQL Subquery

    A subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery…

  • Primary and Foreign Key Constraints

    Primary and Foreign Key Constraints

    Primary keys are an important concept in SQL databases. They provide a unique ID for every row in a database table.

    7 条评论
  • SQL | DDL, DQL, DML, DCL and TCL Commands

    SQL | DDL, DQL, DML, DCL and TCL Commands

    Structured Query Language, or SQL, is a programming language for manipulating databases. It is the language used in…

    30 条评论
  • Road Accident Dashboard Project Using Excel

    Road Accident Dashboard Project Using Excel

    Introduction: A major issue that has an impact on communities all across the world is road safety. I'm excited to…

社区洞察

其他会员也浏览了