Simple Python Script for Clustering Keywords [ Script Included ]
Venkata Pagadala
SEO AI Product Manager | Gen Ai | RAG | Agentic Ai - Ai Agents | Programmatic SEO (PSEO) | Enterprise & Technical SEO
Simple Python Script for Clustering Keywords
??Python code that performs clustering of keywords using the Agglomerative Clustering algorithm and TF-IDF vectorization. Here is a brief overview of the functions in the code:
?? read_keywords(file_path) - reads the keywords from a CSV file specified by file_path and returns a list of keywords.
?? write_clusters_to_csv(file_path, clusters, keywords) - writes the clusters of keywords to a CSV file specified by file_path. The clusters are assigned integer labels and are written to the second column of the output file, with the corresponding keyword in the first column.
?? text_similarity(keywords) - calculates the TF-IDF similarity matrix of the input keywords and returns the matrix.
?? cluster_keywords(similarity_matrix, num_clusters) - performs agglomerative clustering on the similarity matrix using num_clusters clusters and returns the cluster labels.
?? main() - defines the input file, output file, and number of clusters, reads the keywords from the input file, calculates the similarity matrix, performs clustering, and writes the clusters to the output file.
Overall, this code can be used to cluster a set of keywords based on their similarity using TF-IDF vectorization and the Agglomerative Clustering algorithm, and write the resulting clusters to a CSV file.
Script Included
import?cs
import?numpy?as?np
from?sklearn.cluster?import?AgglomerativeClustering
from?sklearn.feature_extraction.text?import?TfidfVectorizer
#?Read?keywords?from?input?file
def?read_keywords(file_path):
????keywords?=?[]
????with?open(file_path,?"r")?as?f:
????????reader?=?csv.reader(f)
????????for?row?in?reader:
????????????keywords.append(row[0])
????return?keywords
#?Write?clustered?keywords?to?output?file
def?write_clusters_to_csv(file_path,?clusters,?keywords):
????with?open(file_path,?"w",?newline='')?as?f:
????????writer?=?csv.writer(f)
????????writer.writerow(["Keyword",?"Cluster"])
????????for?keyword,?cluster?in?zip(keywords,?clusters):
????????????writer.writerow([keyword,?cluster])
#?Calculate?text?similarity?using?TF-IDF
def?text_similarity(keywords):
????vectorizer?=?TfidfVectorizer()
????keyword_matrix?=?vectorizer.fit_transform(keywords)
????return?keyword_matrix
#?Perform?clustering
def?cluster_keywords(similarity_matrix,?num_clusters):
????clustering?=?AgglomerativeClustering(n_clusters=num_clusters)
????clusters?=?clustering.fit_predict(similarity_matrix.toarray())
????return?clusters
#?Main?function
def?main():
????input_file?=?"keywordsinput.csv"
????output_file?=?"Cluster.csv"
????num_clusters?=?5
????keywords?=?read_keywords(input_file)
????similarity_matrix?=?text_similarity(keywords)
????clusters?=?cluster_keywords(similarity_matrix,?num_clusters)
????write_clusters_to_csv(output_file,?clusters,?keywords)
if?__name__?==?"__main__":
????main()v
Input File
Output