Hands-On Example: Google Trends Visualization with Pytrends API – Keyword Volumes, Regional Insights, and Correlations

Hands-On Example: Google Trends Visualization with Pytrends API – Keyword Volumes, Regional Insights, and Correlations

When it comes to keyword research, spotting trends, identifying anomalies, or uncovering correlations between search terms, you don’t necessarily need a bloated SEO tool. While Google Trends is a useful platform, its interface is quite limited. So why not take matters into your own hands by pulling Google Trends data directly through an API and analyzing it in Python?

In this article, I’ll walk you through a practical project using Pytrends ??, a Python library that connects to Google Trends. We'll combine it with popular data science tools like Matplotlib and Seaborn to create engaging visualizations. For this project, we’ll focus on comparing keyword search volumes, analyzing regional search data, and uncovering correlations in search trends over time. I’ll also share some code snippets to help you build a simple analysis tool from scratch.

To make things even more interesting, I’ve included an example at the end of integrating the data with my favorite library, Pygwalker. ?? If you’re unfamiliar, Pygwalker is a lightweight, Python-friendly alternative to tools like Power BI or Tableau. It allows you to create quick and effective visualizations, making exploratory data analysis not only efficient but also visually compelling.

Installation and Documentation

Let's assume that you know Google Trends: https://trends.google.com/trends/ To analyze Google Trends data effectively, we will utilize Pytrends. It allows users to connect to Google Trends, bypass rate limits with proxies, and retrieve various datasets like interest over time, region-specific trends, related topics, and trending searches.

Advanced functionalities include historical hourly data, real-time trends, and keyword suggestions, all with customizable parameters like language, region, and timeframe: https://pypi.org/project/pytrends/

Our first action will be to install this library by using the command-line tool:

pip install pytrends        

Initializing Libraries & Creating GUI

Typically, I don't pay much attention to user interfaces, but in this case, we'll create a straightforward GUI using Tkinter to allow users to input keywords and a timeframe. This approach simplifies interaction.

I am curious about the terms "GitOps", "DevOps", and "Platform Engineering".
How did those search terms evolve between "2021-01-01" to "2024-12-31"?

Using the Pytrends library, we'll connect to Google Trends and validate the user-provided data to ensure that neither the keywords nor the timeframe are left blank. If any invalid input is detected, the program will display an error message via a message box and terminate gracefully. Once validated, the keywords are cleaned and processed into a list for further use.

This setup not only makes the tool more user-friendly but also eliminates the need to hardcode parameters in the script. Of course, we’ll also use essential libraries like Pandas for processing the data and Matplotlib and Seaborn for visualization to round out the functionality.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pytrends.request import TrendReq
import time
import random
import tkinter as tk
from tkinter import simpledialog, messagebox

# Initialize pytrends with custom user agent
pytrends = TrendReq(hl='en-US', tz=360)

# Create a GUI to get keywords and timeframe
root = tk.Tk()
root.withdraw()  # Hide the root window

try:
    keywords = simpledialog.askstring("Input", "Enter keywords separated by commas:", parent=root)
    if not keywords:
        raise ValueError("No keywords entered.")
    keywords = [keyword.strip() for keyword in keywords.split(',') if keyword.strip()]
    if not keywords:
        raise ValueError("Keywords cannot be empty or just commas.")

    timeframe = simpledialog.askstring("Input", "Enter the time frame (e.g., 2023-11-01 2023-12-31):", parent=root)
    if not timeframe:
        raise ValueError("No timeframe entered.")
except ValueError as e:
    messagebox.showerror("Input Error", str(e))
    exit()
        

Fetching Data & Retry Logic API

Before analyzing the data, it is crucial to structure and prepare it properly, as our goal is to compare keyword volumes, including regional differences, and identify correlations. In doing so, we need to be cautious about the frequency of API requests to avoid hitting rate limits imposed by Google Trends:

  1. Top 10 Largest Economies: We define a list of the 10 largest economies using their names as identifiers. These countries are relevant for regional analysis as they represent major markets, making keyword comparisons in these regions particularly valuable.
  2. Retry Logic for API Requests: To handle potential API failures due to rate limits or network issues, a fetch_data_with_retry function is introduced. This function retries a given API method up to a specified number of attempts (retries). After each failed attempt, the function waits for a random delay (to mimic human-like behavior) before retrying. If all attempts fail, the function gracefully handles the error and returns None. This approach ensures robust and resilient data retrieval.
  3. Building the Payload: The 'build_payload()' method is used to prepare the API request with the keywords, timeframe, and geographic parameters. This step is necessary to inform Pytrends about what data we want to fetch. In this example, no specific country is specified for the initial payload (geo=""), meaning it defaults to worldwide data.
  4. Fetching Interest Over Time: The interest_over_time data, which provides trends for the keywords over the specified timeframe, is fetched using the retry function. If the data cannot be retrieved after multiple attempts, an empty DataFrame is created to avoid breaking downstream processes.
  5. Fetching Regional Interest: The interest_by_region method is used to retrieve keyword popularity for the specified countries. The retry logic ensures that we handle failures gracefully. The resulting data is filtered to include only the top 10 largest economies. If the request fails entirely, an empty DataFrame is used as a fallback.
  6. Visualization: Once the data is successfully retrieved, we can use visualization libraries like Matplotlib and Seaborn to analyze and present the results. A figure is initialized with a specified size, and a visually appealing color palette (coolwarm = because It thought it looks nice) is set for consistency in the plots. ??

# Top 10 largest economies (ISO country codes)
largest_economies = ["United States", "China", "Japan", "Germany", "India", "Great Britain", "France", "Italy", "Brazil", "Canada"]

# Retry logic for API requests
def fetch_data_with_retry(pytrends_method, retries=5, delay=5, *args, **kwargs):
    for attempt in range(retries):
        try:
            return pytrends_method(*args, **kwargs)
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < retries - 1:
                sleep_time = delay + random.uniform(0, 2)
                print(f"Retrying after {sleep_time:.2f} seconds...")
                time.sleep(sleep_time)
            else:
                print("Max retries reached. Skipping this request.")
                return None

# Build payload and fetch interest over time data
pytrends.build_payload(keywords, timeframe=timeframe, geo="")
interest_over_time = fetch_data_with_retry(pytrends.interest_over_time)
if interest_over_time is None:
    print("Failed to fetch interest over time data.")
    interest_over_time = pd.DataFrame()

# Fetch regional interest for the top 10 largest economies
top_regions = fetch_data_with_retry(pytrends.interest_by_region, resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)
if top_regions is not None:
    top_regions = top_regions[top_regions.index.isin(largest_economies)]
else:
    print("Failed to fetch regional interest data.")
    top_regions = pd.DataFrame()

# Data Visualization
plt.figure(figsize=(10, 10))
sns.set_palette("coolwarm")        

Generating & Displaying Search Volume-Plots

  1. Interest Over Time - Line Plot: If data is available, a line plot is created for each keyword using a colormap. The x-axis represents time (dates), and the y-axis shows search interest values. Grid lines and legends are included to enhance readability. As you can see in the code below, I have added a fallback in case data is missing.
  2. Regional Interest - Bar Plot: Displays the keyword search interest across the top 10 largest economies. The data is sorted to show the top 10 regions by search interest for the selected keywords. A bar plot is used, with the x-axis representing regions (countries) and the y-axis showing search volume.
  3. Seasonal Trends - Combined Line Plot: Identifies seasonal patterns in search interest by using a 12-month rolling average. A smoothed line plot is generated by calculating the rolling average for each keyword's search interest. Similar to the first plot, this includes grid lines and a legend for better readability.
  4. Correlation Analysis - Heatmap: My favourite one. Analyzes and visualizes the correlation between search interest for the selected keywords. Correlation coefficients (pearson!) are displayed in each cell, with a color gradient (via the "cool" colormap) indicating the strength of the relationship.

# Interest Over Time: Line Plot
plt.subplot(2, 2, 1)
if not interest_over_time.empty:
    interest_over_time[keywords].plot(ax=plt.gca(), colormap="cool")
    plt.title("Interest Over Time", fontsize=12, fontweight='bold')
    plt.xlabel("Date", fontsize=12)
    plt.ylabel("Search Interest", fontsize=12)
    plt.legend(keywords)
    plt.grid(True, linestyle="--", alpha=0.7)
else:
    plt.text(0.5, 0.5, "No data available", horizontalalignment='center', verticalalignment='center', transform=plt.gca().transAxes, fontsize=12)
    plt.title("Interest Over Time", fontsize=12, fontweight='bold')

# Regional Interest: Bar Plot
plt.subplot(2, 2, 2)
if not top_regions.empty:
    top_regions_sorted = top_regions.sort_values(by=keywords, ascending=False).head(10)
    top_regions_sorted.plot(kind="bar", ax=plt.gca(), colormap="cool")
    plt.title("Regional Interest Top 10 Economies", fontsize=12, fontweight='bold')
    plt.xlabel("Region", fontsize=12)
    plt.ylabel("Search Interest", fontsize=12)
    plt.xticks(rotation=45)
else:
    plt.text(0.5, 0.5, "No data available", horizontalalignment='center', verticalalignment='center', transform=plt.gca().transAxes, fontsize=12)
    plt.title("Regional Interest Top 10 Economies", fontsize=12, fontweight='bold')

# Seasonal Trends: Combined Line Plot
plt.subplot(2, 2, 3)
if not interest_over_time.empty:
    interest_over_time[keywords].rolling(12).mean().plot(ax=plt.gca(), colormap="cool")
    plt.title("Seasonal Trends 12 Month Avg.", fontsize=12, fontweight='bold')
    plt.xlabel("Date", fontsize=12)
    plt.ylabel("Search Interest", fontsize=12)
    plt.legend(keywords)
    plt.grid(True, linestyle="--", alpha=0.7)
else:
    plt.text(0.5, 0.5, "No data available", horizontalalignment='center', verticalalignment='center', transform=plt.gca().transAxes, fontsize=12)
    plt.title("Seasonal Trends 12 Month Avg.", fontsize=12, fontweight='bold')

# Correlation Analysis: Heatmap
plt.subplot(2, 2, 4)
if not interest_over_time.empty:
    correlation = interest_over_time[keywords].corr()
    sns.heatmap(correlation, annot=True, cmap="cool", ax=plt.gca(), cbar_kws={'shrink': 0.8})
    plt.title("Correlation", fontsize=12, fontweight='bold')
else:
    plt.text(0.5, 0.5, "No data available", horizontalalignment='center', verticalalignment='center', transform=plt.gca().transAxes, fontsize=12)
    plt.title("Correlation", fontsize=12, fontweight='bold')

# Adjust layout and display
plt.tight_layout()
plt.show()        

The plt.tight_layout() function at he end ensures that all plots fit neatly within the figure without overlapping labels or titles. Finally, plt.show() displays the completed visualizations. Together, these plots provide a simple analysis of search trends for marketing, SEO, or general research.

The term "GitOps" shows a gradual upward trend over time, indicating its growing popularity. Additionally, it exhibits a weak positive correlation with "Platform Engineering," suggesting a potential connection between the two concepts in search behavior.

Pygwalker: A Tableau-like Alternative

I’ve written extensively about Pygwalker, a free and lightweight alternative to tools like Power BI and Tableau. With Pygwalker, you can save yourself a lot of coding effort by handing over your data directly to an intuitive user interface for visualization and analysis. We could have easily used it with our dataset right from the start—or even passed the data to Google Looker Studio for similar results.

import pygwalker as pyg
import webbrowser
import os

# Render Pygwalker visualization and open it in the browser
if not interest_over_time.empty:
    # Create a PygWalker object
    pygwalker_obj = pyg.walk(interest_over_time)

    
    # Export the PygWalker object to an HTML string
    pyg_html = pygwalker_obj.to_html()
    
    # Save the HTML string to a file
    file_path = os.path.abspath("pygwalker_visualization.html")
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(pyg_html)

    # Open the HTML file in the default web browser
    webbrowser.open(f"file://{file_path}")
    print(f"Pygwalker visualization opened in browser: {file_path}")
else:
    print("No data available for Pygwalker visualization.")        
This visualization is still a bit basic, but Pygwalker provides an intuitive drag-and-drop interface that makes creating polished and interactive visualizations effortless.

In previous articles, I’ve demonstrated how to use Pygwalker to analyze data, visualize geospatial information, and even save views for future reference. If you're interested in more details, I recommend to check the kanaries website: https://kanaries.net/pygwalker

Conclusion

Admittedly, tools like Semrush, Ahrefs, Sistrix, Screaming Frog, Google Trends, and Google Search Console are essential in any marketing team's toolkit. However, Pytrends offers the flexibility to generate custom views and insights that can be challenging to achieve with these tools while pulling actual Google data directly. This makes it particularly useful for automating repetitive tasks by creating small, tailored programs with Pytrends.

How practical this approach is may be debatable, but the experiment itself felt insightful. That said, working with the Google API can be finicky—it’s crucial to incorporate delays and randomize timings to avoid hitting rate limits. This is why my fetching function ended up being a bit more elaborate, but it ensures a reliable workflow.

要查看或添加评论,请登录

Bj?rn Thomsen的更多文章

社区洞察