登录查看更多内容

Text Analysis - Word Cloud

Raja Saurabh Tiwari

Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography

发布日期: 2020年11月30日

Text Analysis :

Text analysis one of the richest area in the Machine Learning space. Text analysis is the process of deriving the meaningful insight from the text, sentence, or document also knows as Corpus.

More formally

Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources.

Courtesy : wiki

There are many use cases of text analysis. You may want to analyze the tweets of people. What messages are exchanged in WhatsApp what kind of posts are people sharing over Facebook and other social platforms.

Now analysis can really give you a lot opportunity in clustering the people in certain groups and this can eventually help you in targeting correct audience for your business or make policies accordingly.

From what is trending in twitter to what people are discussing can all be analyzed using various tools available in the market.

Today we'll talk about the one such method called WordCloud. Look at the image below. The image actually shows the most frequently used words in the people have discussed over a chat group. I have taken this text example from one of my WhatsApp group.

Creating Word Cloud:

Let's understand how to create such cloud using Python libraries

Pre-requisite : Dump your data into into a tab delimited file. For this example I have exported WhatsApp data into a text file for demonstration purpose.

Natural Language Toolkit is one of the most widely used libraries in the Machine Learning domain for text analysis. Let's import it

import nltk

Let's import modules from ntlk to help clean the text and remove the noise in the text.

from nltk.tokenize import word_tokenize


from nltk.corpus import stopwords

Let's import pandas for various operations related to data import and manipulation.

import pandas as pd

Read the data into memory as a Dataframe

pd.read_csv("whatsppmessages.csv")

After reading the text into the memory you need to clean the text. ( High level steps are described below but it can be really complex when drilling down to details and doing real text analysis).

The text in WhatsApp will be in the format : 22/11/2020, 23:16 - "$username": "the text"
Split the text and get the message you want to analyze.
Remove Stop words like ('is', 'the', 'a' ,etc.)

Create the word cloud using below code

wordcloud = WordCloud(max_font_size=40).generate(text)

plt.figure()

plt.imshow(wordcloud, interpolation="bilinear")

plt.axis("off")


plt.show()

And there you go, That's it.

Now you can easily analyze it. The word "power" is the word used most frequently in the chat followed by done and today.

We've been discussing problem of electricity in Pune so the word power has been most frequent.

In Machine Learning one of the most important part is presentation. How you can easily convey the summary/outcome of the analysis to the stakeholders. These kind of diagrams, summary charts play key role in telling the story and convincing the client.

Let's meet again with some other interesting topic/application of Machine Learning.

Raja Saurabh Tiwari

要查看或添加评论，请登录

Raja Saurabh Tiwari的更多文章

The Hidden Cost of AI

2025年3月1日

The Hidden Cost of AI

Artificial Intelligence (AI) is revolutionizing industries, enhancing automation, and creating new possibilities for…

3 条评论
Agentic AI - My take

2025年2月16日

Agentic AI - My take

Introduction In recent months, Agentic AI has emerged as a focal point in the technology sector, captivating both…

16 条评论
Large Language Models vs Small Language Models

2024年5月5日

Large Language Models vs Small Language Models

Before directly jumping to LLM, a quick recap on AI and Machine Learning. We all have been seeing the below image which…

2 条评论
So what makes a good data science profile

2022年4月19日

So what makes a good data science profile

Let's start with some stats Data science was named the fastest-growing job in 2017 by LinkedIn, and in 2018 Glassdoor…

3 条评论
Don't let your fear win

2022年4月17日

Don't let your fear win

Once Krishna and Balarama got late playing in the forest. They decided to rest in there over the night and thought to…

1 条评论
Data Lake & Data Mesh

2022年1月21日

Data Lake & Data Mesh

Global data creation is projected to exceed 180 zettabytes in the next five years. It was always a struggle to create a…
Analytics of Data Scientists in Kaggle

2021年2月14日

Analytics of Data Scientists in Kaggle

Kaggle has recently published a report on the Kaggle users on various aspects. The trend shows analysis of people…
Machine Learning (Without CODE)

2020年10月30日

Machine Learning (Without CODE)

Machine learning is very fascinating for data science practitioners and everyone and there's a continuous effort…

2 条评论
Statistics vs. Visualization (#Data Science)

2020年10月24日

Statistics vs. Visualization (#Data Science)

Understanding the statistical properties of the data is one of the key aspect of data science or Machine Learning…
AutoML - first glance

2020年10月10日

AutoML - first glance

"Machine Learning and AI attempts to automate manual work..

See all articles

Text Analysis - Word Cloud

Raja Saurabh Tiwari

Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography

Text Analysis :

Creating Word Cloud:

Raja Saurabh Tiwari的更多文章

社区洞察

其他会员也浏览了

Free Data Science Books (2022)

The MarklDown Project, CoAgents New Release, Building LLMs for Production, PyTorch 101

How Github suggests Tags for your Repositories

Future-Proof Your Career: Key Data Science Skills for the AI Era

Issue #184 - THE ML ENGINEER ??

Exploring Scikit-Learn in 10 Examples

Data Science Technologies

How AI Helps Us Think, and ML Helps Us Improve

What Will I Learn in the Data Science Course?

Best resources to get started with machine learning and AI

Text Analysis :

Creating Word Cloud:

Raja Saurabh Tiwari的更多文章

The Hidden Cost of AI

Agentic AI - My take

Large Language Models vs Small Language Models

So what makes a good data science profile

Don't let your fear win

Data Lake & Data Mesh

Analytics of Data Scientists in Kaggle

Machine Learning (Without CODE)

Statistics vs. Visualization (#Data Science)

AutoML - first glance

社区洞察

其他会员也浏览了

Free Data Science Books (2022)

The MarklDown Project, CoAgents New Release, Building LLMs for Production, PyTorch 101

How Github suggests Tags for your Repositories

Future-Proof Your Career: Key Data Science Skills for the AI Era

Issue #184 - THE ML ENGINEER ??

Exploring Scikit-Learn in 10 Examples

Data Science Technologies

How AI Helps Us Think, and ML Helps Us Improve

What Will I Learn in the Data Science Course?

Best resources to get started with machine learning and AI