Detecting Semantic Change using Word Embedding

Detecting Semantic Change using Word Embedding

Word embedding captures lexicon-properties based on the concurrence statistics from large text corpus. Word embedding acts as high quality semantic vectors  that could be employed in various NLP tasks like syntactic parsing, sentiment analysis, semantic relatedness. Representation captured via embedding is distributional in nature. Such a representation has multiple applications apart from being used as features for everyday natural language processing tasks. One such novel application is to detect changes in the meaning of words.

 Languages evolve. This can be in the form of lexical, phonological, syntactic and semantic changes. Lexical change reflects the ongoing influx of new words or word forms into the language. Phonological change is associated with the concept of sound change covering both phonetic and phonological developments whereas syntactic change refers to the structural changes in the use of language. Semantic change refers the evolution of word usage. More specifically, semantic change is the change in one of the meanings of a word. The marked semantic distance between the new meanings the word takes reflects this change. For instance, the word 'Android' was more synonymous to humanoid talking robot in the late 90's but now it is commonly associated with mobile phone operating system. One may attribute this phenomenon to the emergence of popular topics/senses that persists in an era. However, we believe every word possesses a variety of senses and connotations, which can be added, removed, or altered over time, often to the extent that cognates across space and time to have very different meanings.

We studied semantic changes that occur over years and decades on selective text corpus. Distributional hypothesis is used to capture the semantic change that happens at lexical level over a time period.  To put it differently, we hypothesize whether the word has changed its meaning by observing `the company it keeps'  (as per Firth) using the word vectors. To do this, we first created word vector models from large news and magazine corpus pertaining to different eras of the publication.  Thereafter we introduce a simple, yet effective method to project one word vector model onto another for comparison. We tracked the semantic evolution of the same word using these projected models. 

 Representation of words depended on the format and domain of the text from where co-occurrence statistics are built. Canberra Times (https://www.canberratimes.com.au/) is crawled for the data between 1980 to 2010 accounting for 2.42 Million words.To make robust co-occurrence statistics, data from Wikipedia was added to corpus based on Canberra Times (https://en.wikipedia.org/ accessed on 4 September, 2013). It contains 1.6 billion words.

We divided Canberra Times dataset into three epochs, each spanning a decade.  After training word embedding using Mikolov’s word vector package (https://code.google.com/p/word2vec/) and projecting to multiple models from one epoch to another. Words deviating by wide margins are selected to inspect for semantic change.

 Deviation is measured in terms of the cosine similarity.   This deviation represents the shift in meaning. Words similar to the word in consideration in different word vector models belonging to different epochs are also captured. These words gave a crude meaning of what the word in question actually means in that epoch. Figure below shows some of the words that the system detected with large deviation.

 At present these words are detected based on the divergences of the words from different era above a threshold. Along with the true samples of words showing the semantic change, many words, which do not show the phenomenon, are also captured.  The false positives are high as the method to detect the semantic change based on divergence is noisy. More reliable methods like point change detection algorithm can be employed to reduce the false positive detection.

 Distributional representation of word embedding can have multitudes of applications that go beyond feature engineering for Deep Learning.  At present there is a lot of research conducted in this direction.

Acknowledgment: This is part of joint work with Prachetos Sadhukhan and Vasudevan N, Benoit Favre, and Fredric Bechet by the author during his period in LIF Lab, CNRS, Marseille. 

要查看或添加评论,请登录

Balamurali A R, PhD的更多文章

社区洞察

其他会员也浏览了