Insights from Online Retail Data
Dr. Fatma Ben Mesmia Chaabouni
Ph.D. in CS | MSc_B.Sc. in CS| NLP-AI and Data Analytics- Blockchain researcher | MBA mentor| Tunisian AI Society Member
A Comprehensive Analysis
Analyzing customer behavior and business performance is essential for informed decision-making. Recently, I undertook a project to analyze a large online retail dataset.
This article details the process of extracting valuable insights and visualizing them to support strategic decisions. Join me as we explore the journey, methodology, and outcomes of this analysis.
This project utilized a dataset of online retail transactions, encompassing invoice numbers, stock codes, product descriptions, quantities, invoice dates, unit prices, customer IDs, and countries. The main objective was to derive insights to aid in making data-driven decisions.
Data Preparation
The first step was to load and prepare the dataset for analysis by managing missing values, converting data types, and creating new columns.
Here is the dataset link: https://docs.google.com/spreadsheets/d/1_KjrsirMsHRK51uD1vSKOw6IvFIhs56h/edit?usp=drive_link&ouid=109162540903263315973&rtpof=true&sd=true
import pandas as pd
# Load the dataset
file_path = 'Online Retail.xlsx'
data = pd.read_excel(file_path)
# Basic cleaning and preparation
data.dropna(subset=['CustomerID'], inplace=True) # Remove rows with missing CustomerID
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate']) # Convert InvoiceDate to datetime
# Add a TotalPrice column
data['TotalPrice'] = data['Quantity'] * data['UnitPrice']
Generating Insights
With the data cleaned and prepared, the next step was to extract key insights. Here’s a summary of the findings:
# Function to generate insights
def generate_insights(df):
insights = {}
# Insight 1: Total Revenue
total_revenue = df['TotalPrice'].sum()
insights['Total Revenue'] = total_revenue
# Insight 2: Top 5 Countries by Revenue
top_countries = df.groupby('Country')['TotalPrice'].sum().sort_values(ascending=False).head(5)
insights['Top 5 Countries by Revenue'] = top_countries
# Insight 3: Monthly Revenue Trend
monthly_revenue = df.set_index('InvoiceDate').resample('M')['TotalPrice'].sum()
insights['Monthly Revenue Trend'] = monthly_revenue
# Insight 4: Top 5 Products by Revenue
top_products = df.groupby('Description')['TotalPrice'].sum().sort_values(ascending=False).head(5)
insights['Top 5 Products by Revenue'] = top_products
# Insight 5: Customer Retention (Number of repeat customers)
customer_purchase_counts = df.groupby('CustomerID').size()
repeat_customers = (customer_purchase_counts > 1).sum()
insights['Number of Repeat Customers'] = repeat_customers
# Insight 6: Average Order Value
avg_order_value = df.groupby('InvoiceNo')['TotalPrice'].sum().mean()
insights['Average Order Value'] = avg_order_value
return insights
# Generate insights
insights = generate_insights(data)
insights_df = pd.DataFrame(list(insights.items()), columns=['Insight', 'Value'])
print(insights_df)
Data Visualization
Visualizing insights is essential for their actionability and clarity. The following visualizations depict the key findings.
import matplotlib.pyplot as plt
# Visualization function
def visualize_insights(df):
# Plot 1: Total Revenue
plt.figure(figsize=(10, 6))
total_revenue = df['TotalPrice'].sum()
plt.bar(['Total Revenue'], [total_revenue])
plt.ylabel('Revenue')
plt.title('Total Revenue')
plt.show()
# Plot 2: Top 5 Countries by Revenue
top_countries = df.groupby('Country')['TotalPrice'].sum().sort_values(ascending=False).head(5)
plt.figure(figsize=(10, 6))
top_countries.plot(kind='bar', color='skyblue')
plt.ylabel('Revenue')
plt.title('Top 5 Countries by Revenue')
plt.show()
# Plot 3: Monthly Revenue Trend
monthly_revenue = df.set_index('InvoiceDate').resample('M')['TotalPrice'].sum()
plt.figure(figsize=(10, 6))
monthly_revenue.plot(kind='line', marker='o')
plt.ylabel('Revenue')
plt.title('Monthly Revenue Trend')
plt.show()
# Plot 4: Top 5 Products by Revenue
top_products = df.groupby('Description')['TotalPrice'].sum().sort_values(ascending=False).head(5)
plt.figure(figsize=(10, 6))
top_products.plot(kind='bar', color='lightgreen')
plt.ylabel('Revenue')
plt.title('Top 5 Products by Revenue')
plt.show()
# Plot 5: Number of Repeat Customers
customer_purchase_counts = df.groupby('CustomerID').size()
repeat_customers = (customer_purchase_counts > 1).sum()
single_purchase_customers = (customer_purchase_counts == 1).sum()
plt.figure(figsize=(10, 6))
plt.bar(['Repeat Customers', 'Single Purchase Customers'], [repeat_customers, single_purchase_customers], color=['blue', 'orange'])
plt.ylabel('Number of Customers')
plt.title('Customer Purchase Behavior')
plt.show()
# Plot 6: Average Order Value
avg_order_value = df.groupby('InvoiceNo')['TotalPrice'].sum().mean()
plt.figure(figsize=(10, 6))
plt.bar(['Average Order Value'], [avg_order_value], color='purple')
plt.ylabel('Order Value')
plt.title('Average Order Value')
plt.show()
# Visualize the insights
visualize_insights(data)
Detailed Insights and Visualizations
1. Total Revenue
The total revenue generated by the business was a substantial $89,173.09, highlighting the total business performance.
2. Top 5 Countries by Revenue
领英推荐
The United Kingdom topped the revenue charts, with Netherlands following, highlighting the business's strong presence.
3. Monthly Revenue Trend
Analyzing the monthly revenue trend helps identify seasonal patterns and plan for peak periods.
4. Top 5 Products by Revenue
Top sellers like the REGENCY CAKESTAND 3 TIER and WHITE HANGING HEART T-LIGHT HOLDER offered insights into product popularity.
5. Customer Purchase Behavior
Analysis of customer purchase behavior showed that more 4000 customers made repeat purchases, reflecting customer loyalty.
6. Average Order Value
The average order value exceeded $350, aiding in the establishment of pricing strategies and sales targets.
Explore the datasets to uncover valuable insights. Consider these additional questions for further analysis:
- What are the peak purchase hours and days?
- How does average order value differ by region?
- What effect do discounts or promotions have on sales volume?
- How do customer demographics influence buying behavior?
#DataScience #BusinessIntelligence #DataAnalysis #Python #MachineLearning #BusinessStrategy #RetailAnalytics #DataVisualization #CustomerInsights #BigData #AI #DataDriven #Analytics #TechTrends #Innovation
Data Analyst || Python || Microsoft Power BI || JavaScript || MSSQL Server 2016(SSIS/SSMS/SSAS/SSRS) || Microsoft Excel || TSQL || Market Research ||
1 个月Insightful????