How to create an SEO tool using Python?
Hikari Sohma
Adjunct Researcher of Waseda University, AI Engineer, Marketing Analyst, Master's Degree in Sports Sciences, Specializing in Consumer Behavior and Psychology
In this blog, I will introduce how to automate SEO tasks using Python programming. First, we will review what SEO is, and then I'll walk you through some hands-on Python programming code. SEO is essential in digital marketing, and automating these tasks allows you to focus more on enhancing your content. I hope you find this blog helpful for improving your work efficiency!
What is SEO?
SEO (Search Engine Optimization) is crucial for a website's success and visibility, primarily driven by organic search traffic, which enhances credibility and trust. It is cost-effective compared to paid advertising, providing long-term benefits once optimized. SEO improves user experience through better navigation and faster load times, attracting targeted traffic that converts more effectively. It offers a competitive advantage by helping businesses stay ahead of rivals and adapt to algorithm changes. SEO also provides insights into customer behavior through keyword research and analytics, boosting brand awareness and credibility. Ultimately, investing in SEO leads to sustainable growth and increased revenue.
Steps for Developing an SEO Tool
# Importing Libaries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from collections import Counter
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
# Downloading NLTK
import nltk
nltk.download('punkt')
nltk.download('stopwords')
# Class Initialization
class SEOAnalyzer:
def __init__(self, url):
self.url = url
self.soup = None
self.text = ""
self.word_freq = None
self.stop_words = set(stopwords.words('english'))
__init__ method:
Initializes the class with the URL of the webpage to analyze. It sets up initial variables such as the URL, BeautifulSoup object (soup), the webpage text, word frequency counter, and a set of stopwords (common words to ignore in text analysis).
# Fetch Content
def fetch_content(self):
response = requests.get(self.url)
self.soup = BeautifulSoup(response.content, 'html.parser')
self.text = self.soup.get_text()
fetch_content method:
Sends a request to the provided URL, parses the HTML content using BeautifulSoup, and extracts all the text from the webpage.
# Analyze Meta Tags
def analyze_meta_tags(self):
title = self.soup.find('title').string if self.soup.find('title') else "No title found"
meta_description = self.soup.find('meta', attrs={'name': 'description'})
description = meta_description['content'] if meta_description else "No meta description found"
return {'title': title, 'meta_description': description}
analyze_meta_tags method:
Extracts the title and meta description of the webpage. If the title or meta description is missing, it returns a default message.
# Analyze Headings
def analyze_headings(self):
headings = {'h1': [], 'h2': [], 'h3': []}
for tag in ['h1', 'h2', 'h3']:
headings[tag] = [h.text for h in self.soup.find_all(tag)]
return headings
analyze_headings method:
Finds and returns all h1, h2, and h3 headings on the webpage.
# Analyze Word Frequency
def analyze_word_frequency(self):
words = word_tokenize(self.text.lower())
words = [word for word in words if word.isalnum() and word not in self.stop_words]
self.word_freq = Counter(words)
return self.word_freq.most_common(20)
analyze_word_frequency method:
Tokenizes the text into words, filters out stopwords and non-alphanumeric tokens, counts the frequency of each word, and returns the 20 most common words.
# Visualize Word Frequency
def visualize_word_frequency(self):
plt.figure(figsize=(12, 6))
words, counts = zip(*self.word_freq.most_common(20))
sns.barplot(x=list(words), y=list(counts))
plt.title('Top 20 Words Frequency')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
visualize_word_frequency method:
Uses Seaborn and Matplotlib to create a bar plot of the top 20 most frequent words on the webpage.
# Analyze Keyword Density
def analyze_keyword_density(self, keyword):
total_words = sum(self.word_freq.values())
keyword_count = self.word_freq[keyword.lower()]
density = (keyword_count / total_words) * 100
return f"Keyword '{keyword}' density: {density:.2f}%"
analyze_keyword_density method:
Calculates the density of a specific keyword in the text as a percentage of the total word count.
# Analyze Content Length
def analyze_content_length(self):
return f"Content length: {len(self.text)} characters"
analyze_content_length method:
Returns the total length of the text in characters.
领英推荐
# Analyze Readability
def analyze_readability(self):
sentences = self.text.split('.')
words = word_tokenize(self.text)
avg_sentence_length = len(words) / len(sentences)
return f"Average sentence length: {avg_sentence_length:.2f} words"
analyze_readability method:
Calculates the average sentence length by dividing the total number of words by the total number of sentences.
# Analyze Internal Links
def analyze_internal_links(self):
internal_links = [a['href'] for a in self.soup.find_all('a', href=True) if self.url in a['href']]
return f"Number of internal links: {len(internal_links)}"
analyze_internal_links method: Counts the number of internal links (links that point to the same domain) on the webpage.
# Analyze External Links
def analyze_external_links(self):
external_links = [a['href'] for a in self.soup.find_all('a', href=True) if self.url not in a['href'] and a['href'].startswith('http')]
return f"Number of external links: {len(external_links)}"
analyze_external_links method:
Counts the number of external links (links that point to different domains) on the webpage.
# Analyze Image Alt Tags
def analyze_image_alt_tags(self):
images = self.soup.find_all('img')
images_with_alt = [img for img in images if img.get('alt')]
return f"Images with alt tags: {len(images_with_alt)} out of {len(images)}"
analyze_image_alt_tags method: Counts the number of images with alt attributes and compares it to the total number of images on the page.
# Run Analysis
def run_analysis(self):
self.fetch_content()
meta_tags = self.analyze_meta_tags()
headings = self.analyze_headings()
word_freq = self.analyze_word_frequency()
content_length = self.analyze_content_length()
readability = self.analyze_readability()
internal_links = self.analyze_internal_links()
external_links = self.analyze_external_links()
image_alt_tags = self.analyze_image_alt_tags()
print("SEO Analysis Results:")
print(f"Title: {meta_tags['title']}")
print(f"Meta Description: {meta_tags['meta_description']}")
print(f"H1 Tags: {', '.join(headings['h1'])}")
print(f"Top 5 frequent words: {word_freq[:5]}")
print(content_length)
print(readability)
print(internal_links)
print(external_links)
print(image_alt_tags)
self.visualize_word_frequency()
run_analysis method:
Orchestrates the entire analysis process by calling each method in sequence and printing the results. Finally, it visualizes the word frequency data.
Usage Example
Let's use this tool to perform an SEO analysis. I'll analyze a blog post that Google published in the past.
【The title】
Mouse brain research is helping us better understand human minds.
【URL】
# Usage Example
url = "https://blog.google/technology/research/mouse-brain-research/"
analyzer = SEOAnalyzer(url)
analyzer.run_analysis()
# Keyword Density Analysis
# Specify the Keyword to Analyze
keyword = "AI"
print(analyzer.analyze_keyword_density(keyword))
The SEO analysis of the provided content reveals several important insights. The title of the blog post is "Mouse brain research is helping us better understand human minds," which is both relevant and descriptive, likely to attract users interested in neuroscience and AI. The meta description, "Researchers on our Connectomics team have completed the largest ever AI-assisted digital reconstruction of human brain tissue. Here's why they're taking on the mouse brain next," provides a concise summary of the research focus and highlights the significance of AI-assisted reconstruction.
The H1 tag matches the title, reinforcing the main topic of the article. The top five frequent words in the content are 'google' (46 times), 'brain' (40 times), 'mouse' (20 times), 'human' (20 times), and 'see' (18 times). These words are highly relevant to the article's topic, indicating good keyword usage. The content length is substantial at 23,322 characters, providing in-depth coverage of the topic.
The average sentence length is 28.45 words, which indicates a complex sentence structure. This might be appropriate for a professional or academic audience but should be balanced for readability. The article contains 12 internal links, which help with site navigation and can improve SEO. There are 32 external links, which can be beneficial if they link to credible sources, enhancing the content's authority. Out of 11 images, 10 have alt tags, which are important for both SEO and accessibility. Alt tags help search engines understand image content and improve the experience for visually impaired users.
The bar chart visualization shows the frequency of the top 20 words used in the content, with "google" being the most frequent, followed by "brain," "mouse," "human," and "see." This confirms that the content is focused on its main topics. The keyword "AI" has a density of 1.15%, indicating that it is used appropriately without keyword stuffing, maintaining relevance to the content.
In summary, this SEO analysis shows that the blog post is well-optimized for its main topics. The title and meta description are clear and engaging, and keyword usage is appropriate. However, readability can be improved by adjusting sentence length, and ensuring all images have alt tags can further enhance SEO and accessibility. Adding more internal links can also improve site navigation and SEO.