Auto Generated Insights of 2019 HR Tech Conference Twitter - Part 2 (Topic Modeling)

Auto Generated Insights of 2019 HR Tech Conference Twitter - Part 2 (Topic Modeling)

In our last post, we extract #HRTechConf tweets, clean up the texts, and generate a word cloud that highlights some of the buzzwords from the conference. But, what are the tweets talking about? Without reviewing each of the 7,000 tweets, how could we find out the popular topics? Let's explore and see if tweet topics could be auto detected by developing a Latent Dirichlet Allocation (LDA) model.

Topic Modeling Via LDA

LDA, an unsupervised machine learning algorithm, is a generative statistical model that takes documents as input and finds topics as output. Each document is considered as a mixture of a number of topics, and each topic is determined by presence (i.e. frequency) of the words. LDA is a popular probabilistic topic modeling algorithm.

We use Gensim Python package to implement LDA, and bigrams (i.e. a two-word sequence) instead of single word as features for training. Using 7,000 tweets, LDA generates 30 topics. Each topic is a set of keywords, each contributing a certain weight (i.e. importance) to the topic.

Below are some example topics with top 10 keywords. The higher the numbers are, the bigger weights (importance) the words contribute to the topic.

Topic: 20 
Words: 0.823*"josh bersin" + 0.055*"opening keynote" + 0.018*"woman opening" + 0.012*"opportunity connect" + 0.007*"amaze event" + 0.006*"rebeccahrexec katieachille" + 0.006*"jeanneachille rebeccahrexec" + 0.006*"steveboese jeanneachille" + 0.005*"event feel" + 0.005*"lucky opportunity"

It seems this topic is about "josh bersin" addressing "opening keynote" and "woman opening". After checking HR Tech Conf website, we can confirm Josh Bersin is a keynote speaker and Women in HR Tech opening happens at the same venue. Here is an example tweet:

Checking in from @Josh_Bersin's opening keynote at #HRTechConf today! Our team loved hearing Josh share his insights on how technology is shaping and creating new opportunities for HR. pic.twitter.com/ku27LKzmcb
Topic: 3 
Words: 0.259*"kronos ceo" + 0.059*"rule follow" + 0.041*"kronos highly" + 0.041*"surprisingly simple" + 0.041*"simple rule" + 0.041*"reveals surprisingly" + 0.041*"replicate success" + 0.041*"workinspired kronos" + 0.041*"follow replicate" + 0.041*"culture reveals"

It seems this topic is about Kronos CEO reveals some surprisingly simple rule (of culture? for success?) Here is a tweet:

#hrconfes RT KronosInc: In #WorkInspired, Kronos CEO Aron Ain takes you inside Kronos highly admired culture — and reveals the surprisingly simple rules you can follow to start replicating that success. #HRTechConf
Topic: 7
Words: 0.187*"marcus buckingham" + 0.177*"book signing" + 0.138*"free copy" + 0.124*"nine lie" + 0.079*"present lie" + 0.079*"buckingham present" + 0.001*"peopledoc inc" + 0.001*"drive engagement" + 0.001*"around world" + 0.001*"answer question"

It seems this topic is about Marcus Buckingham having book signing and giving free copies (of Nine Lies? Book name maybe) Here is a tweet:

#hrtechconf book signing with Marcus Buckingham #FreeThinkingCoalition #9liesaboutwork https://www.instagram.com/p/B3KhdHWlRgRIFxreqnoIXDaZNthcdr-_TJ8aeo0/?igshid=19rvtzygznze1 …

Topic Visualization

To make it more interesting, we use pyLDAvis, an interactive LDA visualization package, to plot all generated topics and their keywords. pyLDAvis calculates semantic distance between topics and projects topics on a 2D plane.

Here is the link to the interactive page.

Each bubble on the left represents a topic. The size of the bubble represents prevalence of the topic. The distance between the bubbles reflects the similarity between topics. The closer the two circles are, the more similar the topics are.

On the right hand side, it shows the top 30 most important bigrams of the topic. When hove over a bubble, it will update the list of words on the right. Also, if you select a work from the list, it will highlight the circle(s) that the selected word appears.

HR Tech Conf 2019 Topics by LDA

Closing Notes

We scraped #HRTechConf tweets, generated a word cloud to show the buzzwords, and built a LDA model to lean the topics of the tweets.

LDA is difficult to train and results need human interpretation. However, it is very powerful and yet intuitive. Our experiment shows that the words of the learned topics are not perfectly similar or coherent but are definitely relevant to the topic.

Hopefully, it is useful and interesting to people who could not make their trip to the event.

All codes can be found on GitHub. Read my blog for more.

Fran?ois Bourdeau

Business Development Director @ RSM Canada

5 年

Thanks for sharing Peng Wang, CPA, CMA. I am sure Perry Longinotti?would have his thoughts on why Kronos Incorporated?has such a?highly admired culture.

要查看或添加评论,请登录

Peng Wang的更多文章

社区洞察

其他会员也浏览了