登录查看更多内容

Auto Generated Insights of 2019 HR Tech Conference Twitter - Part 2 (Topic Modeling)

Peng Wang

Engineering Leader | Volunteer Board Director | Community Builder

发布日期: 2019年10月23日

In our last post, we extract #HRTechConf tweets, clean up the texts, and generate a word cloud that highlights some of the buzzwords from the conference. But, what are the tweets talking about? Without reviewing each of the 7,000 tweets, how could we find out the popular topics? Let's explore and see if tweet topics could be auto detected by developing a Latent Dirichlet Allocation (LDA) model.

Topic Modeling Via LDA

LDA, an unsupervised machine learning algorithm, is a generative statistical model that takes documents as input and finds topics as output. Each document is considered as a mixture of a number of topics, and each topic is determined by presence (i.e. frequency) of the words. LDA is a popular probabilistic topic modeling algorithm.

We use Gensim Python package to implement LDA, and bigrams (i.e. a two-word sequence) instead of single word as features for training. Using 7,000 tweets, LDA generates 30 topics. Each topic is a set of keywords, each contributing a certain weight (i.e. importance) to the topic.

Below are some example topics with top 10 keywords. The higher the numbers are, the bigger weights (importance) the words contribute to the topic.

Topic: 20 
Words: 0.823*"josh bersin" + 0.055*"opening keynote" + 0.018*"woman opening" + 0.012*"opportunity connect" + 0.007*"amaze event" + 0.006*"rebeccahrexec katieachille" + 0.006*"jeanneachille rebeccahrexec" + 0.006*"steveboese jeanneachille" + 0.005*"event feel" + 0.005*"lucky opportunity"

It seems this topic is about "josh bersin" addressing "opening keynote" and "woman opening". After checking HR Tech Conf website, we can confirm Josh Bersin is a keynote speaker and Women in HR Tech opening happens at the same venue. Here is an example tweet:

Checking in from @Josh_Bersin's opening keynote at #HRTechConf today! Our team loved hearing Josh share his insights on how technology is shaping and creating new opportunities for HR. pic.twitter.com/ku27LKzmcb

Topic: 3 
Words: 0.259*"kronos ceo" + 0.059*"rule follow" + 0.041*"kronos highly" + 0.041*"surprisingly simple" + 0.041*"simple rule" + 0.041*"reveals surprisingly" + 0.041*"replicate success" + 0.041*"workinspired kronos" + 0.041*"follow replicate" + 0.041*"culture reveals"

It seems this topic is about Kronos CEO reveals some surprisingly simple rule (of culture? for success?) Here is a tweet:

#hrconfes RT KronosInc: In #WorkInspired, Kronos CEO Aron Ain takes you inside Kronos highly admired culture — and reveals the surprisingly simple rules you can follow to start replicating that success. #HRTechConf

Topic: 7
Words: 0.187*"marcus buckingham" + 0.177*"book signing" + 0.138*"free copy" + 0.124*"nine lie" + 0.079*"present lie" + 0.079*"buckingham present" + 0.001*"peopledoc inc" + 0.001*"drive engagement" + 0.001*"around world" + 0.001*"answer question"

It seems this topic is about Marcus Buckingham having book signing and giving free copies (of Nine Lies? Book name maybe) Here is a tweet:

#hrtechconf book signing with Marcus Buckingham #FreeThinkingCoalition #9liesaboutwork https://www.instagram.com/p/B3KhdHWlRgRIFxreqnoIXDaZNthcdr-_TJ8aeo0/?igshid=19rvtzygznze1 …

Topic Visualization

To make it more interesting, we use pyLDAvis, an interactive LDA visualization package, to plot all generated topics and their keywords. pyLDAvis calculates semantic distance between topics and projects topics on a 2D plane.

Here is the link to the interactive page.

Each bubble on the left represents a topic. The size of the bubble represents prevalence of the topic. The distance between the bubbles reflects the similarity between topics. The closer the two circles are, the more similar the topics are.

On the right hand side, it shows the top 30 most important bigrams of the topic. When hove over a bubble, it will update the list of words on the right. Also, if you select a work from the list, it will highlight the circle(s) that the selected word appears.

Closing Notes

We scraped #HRTechConf tweets, generated a word cloud to show the buzzwords, and built a LDA model to lean the topics of the tweets.

LDA is difficult to train and results need human interpretation. However, it is very powerful and yet intuitive. Our experiment shows that the words of the learned topics are not perfectly similar or coherent but are definitely relevant to the topic.

Hopefully, it is useful and interesting to people who could not make their trip to the event.

All codes can be found on GitHub. Read my blog for more.

Fran?ois Bourdeau

Business Development Director @ RSM Canada

5 年

Thanks for sharing Peng Wang, CPA, CMA. I am sure Perry Longinotti?would have his thoughts on why Kronos Incorporated?has such a?highly admired culture.

2 次回应

查看更多评论

要查看或添加评论，请登录

Peng Wang的更多文章

Build Knowledge Graph RAG with LlamaIndex from PDF Documents

2024年7月3日

Build Knowledge Graph RAG with LlamaIndex from PDF Documents

Previously I built a LLM chatbot with PDF documents, using the Retrieval Augmented Generation (RAG) technique…

4 条评论
Build LLM Chatbot With PDF Documents

2024年5月14日

Build LLM Chatbot With PDF Documents

Have you applied for loans, grants, or financial assistance programs? Have you dedicated significant time to…

2 条评论
What Skills Do You Need to Become a Data Engineer

2022年4月4日

What Skills Do You Need to Become a Data Engineer

People often ask me what skills needed to become a data engineer. Before answering that question, let's take a look at…

11 条评论
What Skills Do You Need to Become an HR Analyst

2020年9月3日

What Skills Do You Need to Become an HR Analyst

For work reasons, I have opportunities to interact with HR analysts everyday. I am always curious what skills one would…

8 条评论
Trump VS Trudeau: Who Makes Better Use of Twitter During COVID-19?Crisis

2020年5月5日

Trump VS Trudeau: Who Makes Better Use of Twitter During COVID-19?Crisis

This is a much shorter version of my Medium article. During the COVID-19 pandemic, people take their worries, concerns,…

22 条评论
Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 1

2019年10月15日

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 1

HR Technology Conference and Expo, world’s leading and largest conference for HR and IT professionals, just took place…

2 条评论
Things Employees Like and Dislike About Their Companies

2019年9月4日

Things Employees Like and Dislike About Their Companies

I work in people analytics and have been wondering all the time what make employees feel great or bad about their…

6 条评论
Web App For Border Crossing Wait Time Forecast – Part 2

2019年8月7日

Web App For Border Crossing Wait Time Forecast – Part 2

Keywords: Web App, Flask, AJAX, API, AWS, Virtual Environment Previously I built the Flask web app that runs on my…
Web App For Border Crossing Wait Time Forecast - Part 1

2019年7月11日

Web App For Border Crossing Wait Time Forecast - Part 1

About a year ago I built a predictive model for predicting border crossing wait time. There were a lot of feature…

9 条评论
Credit Card Fraud Detection Using SMOTE Technique

2019年6月10日

Credit Card Fraud Detection Using SMOTE Technique

Outlier detection is is an interesting application of machine learning. The goal is to identify those useful data…

3 条评论

See all articles

Auto Generated Insights of 2019 HR Tech Conference Twitter - Part 2 (Topic Modeling)

Peng Wang

Engineering Leader | Volunteer Board Director | Community Builder

Topic Modeling Via LDA

Topic Visualization

Closing Notes

Peng Wang的更多文章

社区洞察

其他会员也浏览了

How IT Staffing Companies Support the Growing Demand for AI Talent

A human-centric approach when embracing AI in Talent Acquisition

AI in the Job Market: Creator or Job Slayer?

#14 - Barber beran

FD#25 - Artificial Intelligence’s Impact on Global Labor Market. Will AI eliminate my job?

#20 - H.A.I.R systems and design

The Upsides to AI: How tech can be a force for good

The New Skillset Revolution: More Than 20% of US Jobs Didn't Exist Two Decades Ago !

How To Position Your AI Organisation As An Employer Of Choice

Week of July 15th, 2024

Topic Modeling Via LDA

Topic Visualization

Closing Notes

Peng Wang的更多文章

Build Knowledge Graph RAG with LlamaIndex from PDF Documents

Build LLM Chatbot With PDF Documents

What Skills Do You Need to Become a Data Engineer

What Skills Do You Need to Become an HR Analyst

Trump VS Trudeau: Who Makes Better Use of Twitter During COVID-19?Crisis

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 1

Things Employees Like and Dislike About Their Companies

Web App For Border Crossing Wait Time Forecast – Part 2

Web App For Border Crossing Wait Time Forecast - Part 1

Credit Card Fraud Detection Using SMOTE Technique

社区洞察

其他会员也浏览了

How IT Staffing Companies Support the Growing Demand for AI Talent

A human-centric approach when embracing AI in Talent Acquisition

AI in the Job Market: Creator or Job Slayer?

#14 - Barber beran

FD#25 - Artificial Intelligence’s Impact on Global Labor Market. Will AI eliminate my job?

#20 - H.A.I.R systems and design

The Upsides to AI: How tech can be a force for good

The New Skillset Revolution: More Than 20% of US Jobs Didn't Exist Two Decades Ago !

How To Position Your AI Organisation As An Employer Of Choice

Week of July 15th, 2024