Simple Python Script for Clustering Keywords [ Script Included ]

Simple Python Script for Clustering Keywords [ Script Included ]

Simple Python Script for Clustering Keywords

??Python code that performs clustering of keywords using the Agglomerative Clustering algorithm and TF-IDF vectorization. Here is a brief overview of the functions in the code:

?? read_keywords(file_path) - reads the keywords from a CSV file specified by file_path and returns a list of keywords.

?? write_clusters_to_csv(file_path, clusters, keywords) - writes the clusters of keywords to a CSV file specified by file_path. The clusters are assigned integer labels and are written to the second column of the output file, with the corresponding keyword in the first column.

?? text_similarity(keywords) - calculates the TF-IDF similarity matrix of the input keywords and returns the matrix.

?? cluster_keywords(similarity_matrix, num_clusters) - performs agglomerative clustering on the similarity matrix using num_clusters clusters and returns the cluster labels.

?? main() - defines the input file, output file, and number of clusters, reads the keywords from the input file, calculates the similarity matrix, performs clustering, and writes the clusters to the output file.

Overall, this code can be used to cluster a set of keywords based on their similarity using TF-IDF vectorization and the Agglomerative Clustering algorithm, and write the resulting clusters to a CSV file.


Script Included


import?cs
import?numpy?as?np
from?sklearn.cluster?import?AgglomerativeClustering
from?sklearn.feature_extraction.text?import?TfidfVectorizer


#?Read?keywords?from?input?file
def?read_keywords(file_path):
????keywords?=?[]
????with?open(file_path,?"r")?as?f:
????????reader?=?csv.reader(f)
????????for?row?in?reader:
????????????keywords.append(row[0])
????return?keywords


#?Write?clustered?keywords?to?output?file
def?write_clusters_to_csv(file_path,?clusters,?keywords):
????with?open(file_path,?"w",?newline='')?as?f:
????????writer?=?csv.writer(f)
????????writer.writerow(["Keyword",?"Cluster"])
????????for?keyword,?cluster?in?zip(keywords,?clusters):
????????????writer.writerow([keyword,?cluster])


#?Calculate?text?similarity?using?TF-IDF
def?text_similarity(keywords):
????vectorizer?=?TfidfVectorizer()
????keyword_matrix?=?vectorizer.fit_transform(keywords)
????return?keyword_matrix


#?Perform?clustering
def?cluster_keywords(similarity_matrix,?num_clusters):
????clustering?=?AgglomerativeClustering(n_clusters=num_clusters)
????clusters?=?clustering.fit_predict(similarity_matrix.toarray())
????return?clusters


#?Main?function
def?main():
????input_file?=?"keywordsinput.csv"
????output_file?=?"Cluster.csv"
????num_clusters?=?5


????keywords?=?read_keywords(input_file)
????similarity_matrix?=?text_similarity(keywords)
????clusters?=?cluster_keywords(similarity_matrix,?num_clusters)
????write_clusters_to_csv(output_file,?clusters,?keywords)


if?__name__?==?"__main__":
????main()v        

Input File

No alt text provided for this image

Output

No alt text provided for this image

要查看或添加评论,请登录

Venkata Pagadala的更多文章

社区洞察

其他会员也浏览了