From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib
Useful and absurdly pointless animations of your ecommerce data with Matplotlib. Source: Bj?rn Thomsen

From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

Sometimes, it's not enough to simply track and tabulate Ecommerce or website KPIs. Marketers often face the challenge of visualizing and presenting their data effectively. Visualizations can be easily created in Python using Matplotlib and its Animation class.

Through a series of useful and not so useful animations, I will demonstrate how e-commerce data and website logs can be (engagingly) animated by Marketers who lack in-depth coding knowledge.

Exporting and Analyzing E-commerce Key Metrics

In the first step, we need to export the data for analysis. For example, we could work with server logs or data from our web analytics platform, such as Adobe Analytics, GA4, or Matomo. For demonstration purposes, we will use the "E-Commerce Website Logs" dataset from Kaggle: https://www.kaggle.com/datasets/kzmontage/e-commerce-website-logs/data

Perfect for testing: the 'E-Commerce Website Logs' which can be downloaded from Kagglec.om.

This set covers the IP address, country, amount of sales, visits, date, and some more relevant data.??

Animating a Line Chart to Visualize E-commerce Sales Trends

We start by creating an animated line chart to visualize daily e-commerce sales using Matplotlib, NumPy, and Pandas. First, we load the dataset from a CSV file into a DataFrame, convert 'accessed_date' to a datetime format, and extract the date. We aggregate daily sales and store the result in daily_sales.


Next, we set up a black-background figure and axis, plotting the initial line chart with cyan lines. We label the axes 'Date' and 'Total Sales', add a title, and configure white tick labels with a 45-degree rotation for readability. We also set the axis limits to encompass the full data range.

To animate, we define an update function that refreshes the line plot for each frame, gradually revealing more data points. Using animation.FuncAnimation, we create the animation with 100-millisecond intervals. Finally, we display the animation with plt.show(), illustrating the progression of daily e-commerce sales over time.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.animation as animation

# Load the dataset
file_path = 'C:/User/Username/Data.csv'
ecommerce_data = pd.read_csv(file_path, low_memory=False)

# Parse the accessed_date to extract the date
ecommerce_data['accessed_date'] = pd.to_datetime(ecommerce_data['accessed_date'])
ecommerce_data['date'] = ecommerce_data['accessed_date'].dt.date

# Aggregate sales by date
daily_sales = ecommerce_data.groupby('date')['sales'].sum().reset_index()

# Create a figure and axis with a black background
fig, ax = plt.subplots(figsize=(12, 8), facecolor='black')

# Set axis background color to black
ax.set_facecolor('black')

# Initial plot
line, = ax.plot(daily_sales['date'], daily_sales['sales'], color='cyan')

# Set labels and title
ax.set_xlabel('Date', color='white')
ax.set_ylabel('Total Sales', color='white')
ax.set_title('E-commerce Sales Over Time', color='white')
ax.tick_params(colors='white', rotation=45)

# Set the limits for x and y axis
ax.set_xlim(daily_sales['date'].min(), daily_sales['date'].max())
ax.set_ylim(0, daily_sales['sales'].max())

# Function to update the animation
def update(num, daily_sales, line):
    line.set_data(daily_sales['date'][:num], daily_sales['sales'][:num])
    return line,

# Create the animation
ani = animation.FuncAnimation(fig, update, frames=len(daily_sales), fargs=[daily_sales, line], interval=100, blit=True)

# Display the animation
plt.show()        

Animating a Pie Chart to Showcase E-commerce Sales Distribution

One of the least popular forms of visualization, because it ultimately only shows the distribution of a single variable, is the pie chart.?? Despite this, we will boldly venture into animating it.

Playful, but not very useful: Animating sales distribution over time with a pie chart. Who needs this?!

In the following code, we create an animated pie chart to visualize daily e-commerce sales distribution using Matplotlib. We import necessary libraries, load the dataset from a CSV file into a Pandas DataFrame, convert 'accessed_date' to a datetime format, and aggregate sales by date. We then set up a figure with a black background and generate colors for the pie chart.

The update function dynamically refreshes the pie chart for each frame, showing the sales distribution up to the current date with labeled slices and white text for readability. Using animation.FuncAnimation, we animate the chart to update every 500 milliseconds. Finally, plt.show() displays the animation, visually representing the evolving sales distribution.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.animation as animation

# Load the dataset
file_path = 'C:/User/Username/Data.csv'
ecommerce_data = pd.read_csv(file_path, low_memory=False)

# Parse the accessed_date to extract the date
ecommerce_data['accessed_date'] = pd.to_datetime(ecommerce_data['accessed_date'])
ecommerce_data['date'] = ecommerce_data['accessed_date'].dt.date

# Aggregate sales by date
daily_sales = ecommerce_data.groupby('date')['sales'].sum().reset_index()

# Create a figure and axis with a black background
fig, ax = plt.subplots(figsize=(12, 8), facecolor='black')

# Set the figure background color to black
fig.patch.set_facecolor('black')

# Colors for the pie chart
colors = plt.cm.tab20(np.linspace(0, 1, len(daily_sales)))

# Function to update the pie chart
def update(num):
    ax.clear()
    ax.set_facecolor('black')
    wedges, texts, autotexts = ax.pie(
        daily_sales['sales'][:num+1], 
        labels=daily_sales['date'][:num+1], 
        colors=colors[:num+1], 
        autopct='%1.1f%%',
        startangle=140
    )
    
    # Set the properties of texts and autotexts
    for text in texts:
        text.set_color('white')
    for autotext in autotexts:
        autotext.set_color('white')
    
    ax.set_title('E-commerce Sales Distribution by Date', color='white')

# Create the animation
ani = animation.FuncAnimation(fig, update, frames=len(daily_sales), interval=500, repeat=False)

# Display the animation
        

Animating Box Plots to Visualize E-commerce Sales Development

Next, we animate box plots to visualize the distribution of our e-commerce sales data. Box plots display the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values, highlighting data spread, skewness, and outliers, making them useful for comparing distributions.

Box plots show the distribution of a dataset (here: e-commerce sales) by displaying its minimum, first quartile (Q1), median, third quartile (Q3), and maximum values, highlighting the spread, skewness, and potential outliers.

We start by importing essential libraries and loading the e-commerce data from a CSV file into a DataFrame, converting 'accessed_date' to datetime, and extracting the date. We aggregate sales by date to get daily totals and set up a figure and axis with a black background for visual appeal.

The update function dynamically refreshes the box plot for each frame, showing the cumulative distribution of sales over time with cyan boxes and red medians. X-axis labels are rotated and colored white for readability.

Using animation.FuncAnimation, we create an animation that updates every 500 milliseconds. Finally, plt.show() is called to display the animation, effectively illustrating the evolving distribution of daily e-commerce sales.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.animation as animation

# Load the dataset
file_path = 'C:/User/Username/Data.csv'
ecommerce_data = pd.read_csv(file_path, low_memory=False)

# Parse the accessed_date to extract the date
ecommerce_data['accessed_date'] = pd.to_datetime(ecommerce_data['accessed_date'])
ecommerce_data['date'] = ecommerce_data['accessed_date'].dt.date

# Aggregate sales by date
daily_sales = ecommerce_data.groupby('date')['sales'].sum().reset_index()

# Create a figure and axis with a black background
fig, ax = plt.subplots(figsize=(12, 8), facecolor='black')

# Set the figure background color to black
fig.patch.set_facecolor('black')

# Function to update the box plot
def update(num):
    ax.clear()
    ax.set_facecolor('black')
    
    # Create boxplot
    data_to_plot = [daily_sales['sales'][:i+1] for i in range(num+1)]
    ax.boxplot(data_to_plot, patch_artist=True,
               boxprops=dict(facecolor="cyan", color="cyan"),
               whiskerprops=dict(color="cyan"),
               capprops=dict(color="cyan"),
               medianprops=dict(color="red"))
    
    ax.set_xticks(range(1, num+2))
    ax.set_xticklabels(daily_sales['date'][:num+1], rotation=45, ha='right', color='white')
    ax.set_ylabel('Total Sales', color='white')
    ax.set_title('Daily E-commerce Sales Distribution', color='white')

# Create the animation
ani = animation.FuncAnimation(fig, update, frames=len(daily_sales), interval=500, repeat=False)

# Display the animation
plt.show()        

Creating a Spiral Animation for Daily Website Visits

Next, we will generate a spiral animation.?? You might recognize this type of visualization from climate change diagrams, where the temperature is growing alarmingly fast with each rotation (representing one year). In our case, we hope to see an increase in website visits with each rotation (representing one day).

The animated spiral chart allows for the comparison of data over multiple cycles, such as days, or years.

First, we are loading and processing e-commerce website log data from our CSV file, converting the 'accessed_date' column to datetime format, and extracting the hour and day information for each access.

We group the data by day and hour, count the number of visits for each combination, and normalize these visit counts for better visualization. The spiral data is created by generating angles and radii to represent the visits per hour in a spiral format, ensuring a continuous spiral representation over the entire dataset. Unique colors are assigned to each day using a colormap to differentiate the days in the visualization.

We create and animate a polar (spiral) plot with a black background and neon colors. The animation function updates the plot frame by frame to show the progression of visits over time, with each frame representing an hour of data.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import numpy as np

# Load the dataset
file_path = 'C:/User/Username/Data.csv'
data = pd.read_csv(file_path, low_memory=False)

# Convert 'accessed_date' to datetime and extract the hour and day
data['accessed_date'] = pd.to_datetime(data['accessed_date'])
data['hour'] = data['accessed_date'].dt.hour
data['day'] = data['accessed_date'].dt.date

# Group by day and hour and count visits
visits_per_hour = data.groupby(['day', 'hour']).size().reset_index(name='visits')

# Normalize visits for better visualization in the spiral
visits_per_hour['normalized_visits'] = (visits_per_hour['visits'] - visits_per_hour['visits'].min()) / \
                                        (visits_per_hour['visits'].max() - visits_per_hour['visits'].min())

# Create spiral data
total_hours = len(visits_per_hour)
angles = np.linspace(0, (total_hours / 24) * 2 * np.pi, total_hours)
radii = visits_per_hour['normalized_visits'] * 10

# Colors for each day
days = visits_per_hour['day'].unique()
colors = plt.cm.viridis(np.linspace(0, 1, len(days)))

# Create the spiral plot
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'}, facecolor='black')
ax.set_ylim(0, 10)
ax.set_yticklabels([])  # Remove radial labels
ax.set_xticklabels([])  # Remove angular labels

# Animation function
def animate(i):
    ax.clear()
    ax.set_ylim(0, 10)
    ax.set_facecolor('black')
    ax.set_yticklabels([])  # Remove radial labels
    ax.set_xticklabels([])  # Remove angular labels

    day_index = i // 24
    hour_index = i % 24

    for j in range(day_index + 1):
        day_data = visits_per_hour[visits_per_hour['day'] == days[j]]
        day_angles = angles[day_data.index]
        day_radii = radii[day_data.index]
        if j < day_index:
            ax.plot(day_angles, day_radii, color=colors[j])
            ax.fill(day_angles, day_radii, color=colors[j], alpha=0.1)
        else:
            ax.plot(day_angles[:hour_index + 1], day_radii[:hour_index + 1], color=colors[j])
            ax.fill(day_angles[:hour_index + 1], day_radii[:hour_index + 1], color=colors[j], alpha=0.9)

    ax.set_title(f'Hour {hour_index} of Day {day_index}', color='white')

# Add labels
plt.figtext(0.1, 0.9, 'Visits', fontsize=12, color='white')
plt.figtext(0.1, 0.85, 'Hours', fontsize=12, color='white')
plt.figtext(0.1, 0.8, 'Days', fontsize=12, color='white')

ani = animation.FuncAnimation(fig, animate, frames=total_hours, interval=50)

plt.show()        

Wave Animation: Simulating Staggered E-commerce Sales Across Regions

Lastly, I want to showcase a wave animation that simulates our e-commerce sales throughout the day as waves, appearing staggered due to regional time differences. Though not particularly practical in marketing, this wave diagram offers a unique visualization akin to a seismogram for earthquakes. ??

Earthquake Detector? This abomination of an animated wave chart is something for creative minds.

We begin by importing essential libraries and loading the dataset from a CSV file into a Pandas DataFrame, converting 'accessed_date' to datetime, and extracting the date. We aggregate sales by date and country, normalize the data, and set up a figure with a black background.

The update function dynamically refreshes the wave chart for each frame, shifting data and filling in new values with actual sales data. It adjusts wave amplitude, y-axis data, and date text for each frame. Using animation.FuncAnimation, we create the animation, updating every 400 milliseconds. Finally, we display it with plt.show(), illustrating the staggered sales waves across different regions.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.animation as animation

# Load the dataset
file_path = 'C:/User/Username/Data.csv'
ecommerce_data = pd.read_csv(file_path, low_memory=False)

# Parse the accessed_date to extract the date
ecommerce_data['accessed_date'] = pd.to_datetime(ecommerce_data['accessed_date'])
ecommerce_data['date'] = ecommerce_data['accessed_date'].dt.date

# Aggregate sales by date and country
daily_sales = ecommerce_data.groupby(['date', 'country'])['sales'].sum().unstack(fill_value=0).reset_index()

# Normalize the sales data for better visualization
normalized_sales = daily_sales.set_index('date')
normalized_sales = normalized_sales / normalized_sales.max()

# Create new Figure with black background
fig = plt.figure(figsize=(14, 8), facecolor='black')

# Add a subplot with no frame
ax = plt.subplot(frameon=False)

# Define the colors for different regions
colors = plt.cm.viridis(np.linspace(0, 1, len(normalized_sales.columns)))

# Generate initial wave data with the correct shape
num_regions = len(normalized_sales.columns)
data = np.random.uniform(0, 1, (num_regions, len(normalized_sales)))
X = np.linspace(-1, 1, len(normalized_sales))
G = 3.5 * np.exp(-4 * X ** 2)  # Increase the amplitude of the waves

# Generate line plots for each region
lines = []
for i, country in enumerate(normalized_sales.columns):
    # Small reduction of the X extents to get a cheap perspective effect
    xscale = 1 - i / 200.
    # Same for linewidth (thicker strokes on bottom)
    lw = 1.5 - i / 100.0
    line, = ax.plot(xscale * X, i + G * data[i], color=colors[i], lw=lw, label=country)
    lines.append(line)

# Set y limit (or first line is cropped because of thickness)
ax.set_ylim(-1, num_regions + 5)

# No ticks
ax.set_xticks([])
ax.set_yticks([])

# Set titles
title = ax.text(0.5, 1.0, "E-COMMERCE SALES WAVES BY REGION", transform=ax.transAxes,
        ha="center", va="bottom", color="w",
        family="sans-serif", fontweight="bold", fontsize=16)

date_text = ax.text(0.5, 0.95, "", transform=ax.transAxes,
                    ha="center", va="bottom", color="w",
                    family="sans-serif", fontweight="light", fontsize=12)

# Add legend
ax.legend(loc='upper right', fontsize='small', frameon=False, facecolor='black', labelcolor='white')

def update(frame):
    # Shift all data to the right
    data[:, 1:] = data[:, :-1]

    # Fill-in new values with actual sales data
    if frame < len(normalized_sales):
        data[:, 0] = normalized_sales.iloc[frame].values
    else:
        data[:, 0] = 0

    # Update data
    for i in range(num_regions):
        lines[i].set_ydata(i + G * data[i])

    # Update date text
    date_text.set_text(f"Date: {normalized_sales.index[frame]}")

    # Return modified artists
    return lines + [date_text]

# Construct the animation, using the update function as the animation director.
anim = animation.FuncAnimation(fig, update, frames=len(normalized_sales), interval=400, blit=True)

plt.show()        

Conclusion

I firmly believe that every marketing specialist responsible for a specific channel should know and be able to report key-metrics. Tools like Qlik Sense, Salesforce Tableau, Microsoft Power BI, and Google LookerStudio offer a user-friendly interface for quickly visualizing data. For those willing to get their hands a bit dirtier and create fully customizable animations, Python with Matplotlib is the way to go.

Matplotlib makes it easy to create animations. These can be executed not only in a dashboard, such as with Plotly, but also easily saved as GIF animations and embedded in a presentation. I would like to share this feature with you as well:

ani.save('C:/User/Username/File.gif', writer='pillow')        

要查看或添加评论,请登录