Insights from Online Retail Data

Insights from Online Retail Data

A Comprehensive Analysis

Analyzing customer behavior and business performance is essential for informed decision-making. Recently, I undertook a project to analyze a large online retail dataset.

This article details the process of extracting valuable insights and visualizing them to support strategic decisions. Join me as we explore the journey, methodology, and outcomes of this analysis.

  • Project Overview

This project utilized a dataset of online retail transactions, encompassing invoice numbers, stock codes, product descriptions, quantities, invoice dates, unit prices, customer IDs, and countries. The main objective was to derive insights to aid in making data-driven decisions.

Data Preparation

The first step was to load and prepare the dataset for analysis by managing missing values, converting data types, and creating new columns.

Here is the dataset link: https://docs.google.com/spreadsheets/d/1_KjrsirMsHRK51uD1vSKOw6IvFIhs56h/edit?usp=drive_link&ouid=109162540903263315973&rtpof=true&sd=true

import pandas as pd

# Load the dataset
file_path = 'Online Retail.xlsx'
data = pd.read_excel(file_path)

# Basic cleaning and preparation
data.dropna(subset=['CustomerID'], inplace=True)  # Remove rows with missing CustomerID
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])  # Convert InvoiceDate to datetime

# Add a TotalPrice column
data['TotalPrice'] = data['Quantity'] * data['UnitPrice']
        

Generating Insights

With the data cleaned and prepared, the next step was to extract key insights. Here’s a summary of the findings:

  1. Total Revenue: The total revenue generated by the business.
  2. Top 5 Countries by Revenue: Identifying the leading revenue-generating countries.
  3. Monthly Revenue Trend: Analyzing revenue trends over time.
  4. Top 5 Products by Revenue: Identifying the best-selling products.
  5. Customer Retention: Assessing the number of repeat customers.
  6. Average Order Value: Calculating the average order value.

# Function to generate insights
def generate_insights(df):
    insights = {}
    
    # Insight 1: Total Revenue
    total_revenue = df['TotalPrice'].sum()
    insights['Total Revenue'] = total_revenue
    
    # Insight 2: Top 5 Countries by Revenue
    top_countries = df.groupby('Country')['TotalPrice'].sum().sort_values(ascending=False).head(5)
    insights['Top 5 Countries by Revenue'] = top_countries
    
    # Insight 3: Monthly Revenue Trend
    monthly_revenue = df.set_index('InvoiceDate').resample('M')['TotalPrice'].sum()
    insights['Monthly Revenue Trend'] = monthly_revenue
    
    # Insight 4: Top 5 Products by Revenue
    top_products = df.groupby('Description')['TotalPrice'].sum().sort_values(ascending=False).head(5)
    insights['Top 5 Products by Revenue'] = top_products
    
    # Insight 5: Customer Retention (Number of repeat customers)
    customer_purchase_counts = df.groupby('CustomerID').size()
    repeat_customers = (customer_purchase_counts > 1).sum()
    insights['Number of Repeat Customers'] = repeat_customers
    
    # Insight 6: Average Order Value
    avg_order_value = df.groupby('InvoiceNo')['TotalPrice'].sum().mean()
    insights['Average Order Value'] = avg_order_value
    
    return insights

# Generate insights
insights = generate_insights(data)
insights_df = pd.DataFrame(list(insights.items()), columns=['Insight', 'Value'])
print(insights_df)
        

Data Visualization

Visualizing insights is essential for their actionability and clarity. The following visualizations depict the key findings.

import matplotlib.pyplot as plt

# Visualization function
def visualize_insights(df):
    # Plot 1: Total Revenue
    plt.figure(figsize=(10, 6))
    total_revenue = df['TotalPrice'].sum()
    plt.bar(['Total Revenue'], [total_revenue])
    plt.ylabel('Revenue')
    plt.title('Total Revenue')
    plt.show()
    
    # Plot 2: Top 5 Countries by Revenue
    top_countries = df.groupby('Country')['TotalPrice'].sum().sort_values(ascending=False).head(5)
    plt.figure(figsize=(10, 6))
    top_countries.plot(kind='bar', color='skyblue')
    plt.ylabel('Revenue')
    plt.title('Top 5 Countries by Revenue')
    plt.show()
    
    # Plot 3: Monthly Revenue Trend
    monthly_revenue = df.set_index('InvoiceDate').resample('M')['TotalPrice'].sum()
    plt.figure(figsize=(10, 6))
    monthly_revenue.plot(kind='line', marker='o')
    plt.ylabel('Revenue')
    plt.title('Monthly Revenue Trend')
    plt.show()
    
    # Plot 4: Top 5 Products by Revenue
    top_products = df.groupby('Description')['TotalPrice'].sum().sort_values(ascending=False).head(5)
    plt.figure(figsize=(10, 6))
    top_products.plot(kind='bar', color='lightgreen')
    plt.ylabel('Revenue')
    plt.title('Top 5 Products by Revenue')
    plt.show()
    
    # Plot 5: Number of Repeat Customers
    customer_purchase_counts = df.groupby('CustomerID').size()
    repeat_customers = (customer_purchase_counts > 1).sum()
    single_purchase_customers = (customer_purchase_counts == 1).sum()
    plt.figure(figsize=(10, 6))
    plt.bar(['Repeat Customers', 'Single Purchase Customers'], [repeat_customers, single_purchase_customers], color=['blue', 'orange'])
    plt.ylabel('Number of Customers')
    plt.title('Customer Purchase Behavior')
    plt.show()
    
    # Plot 6: Average Order Value
    avg_order_value = df.groupby('InvoiceNo')['TotalPrice'].sum().mean()
    plt.figure(figsize=(10, 6))
    plt.bar(['Average Order Value'], [avg_order_value], color='purple')
    plt.ylabel('Order Value')
    plt.title('Average Order Value')
    plt.show()

# Visualize the insights
visualize_insights(data)
        

Detailed Insights and Visualizations

1. Total Revenue

Total Revenue

The total revenue generated by the business was a substantial $89,173.09, highlighting the total business performance.

2. Top 5 Countries by Revenue

Top 5 Revenue-Generating Countries

The United Kingdom topped the revenue charts, with Netherlands following, highlighting the business's strong presence.

3. Monthly Revenue Trend

Monthly Revenue Trend


Analyzing the monthly revenue trend helps identify seasonal patterns and plan for peak periods.

4. Top 5 Products by Revenue

Top 5 Products by Revenue

Top sellers like the REGENCY CAKESTAND 3 TIER and WHITE HANGING HEART T-LIGHT HOLDER offered insights into product popularity.

5. Customer Purchase Behavior

Customer Purchase Behavior

Analysis of customer purchase behavior showed that more 4000 customers made repeat purchases, reflecting customer loyalty.

6. Average Order Value

Average Order Value


The average order value exceeded $350, aiding in the establishment of pricing strategies and sales targets.

Explore the datasets to uncover valuable insights. Consider these additional questions for further analysis:

- What are the peak purchase hours and days?

- How does average order value differ by region?

- What effect do discounts or promotions have on sales volume?

- How do customer demographics influence buying behavior?

#DataScience #BusinessIntelligence #DataAnalysis #Python #MachineLearning #BusinessStrategy #RetailAnalytics #DataVisualization #CustomerInsights #BigData #AI #DataDriven #Analytics #TechTrends #Innovation

Dillip Singh

Data Analyst || Python || Microsoft Power BI || JavaScript || MSSQL Server 2016(SSIS/SSMS/SSAS/SSRS) || Microsoft Excel || TSQL || Market Research ||

1 个月

Insightful????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了