登录查看更多内容

Word2vec - Word embeddings used in NLP

Vijay Ram

Head of Technology | eCommerce & Product Specialist | Digital Transformation & Innovation

发布日期: 2018年2月1日

This isn't a "This week in Machine Learning"(TWiML) article, but is one to demystify myself (complete novice in this space) and hopefully a few others with the jargons floating around . Of late, I came across outside the commonly used phrases like "neural network", "convolutional neural net", "natural language processing",etc….the phrase "word2vec/embedding".

What is "Word2vec" and how is it applicable in the world of AI? This is super cool to begin with.

As I understood it, the original idea comes from linguists (credits to J R Firth back ~1950) although "Word2vec" is fairly new (credits to Google in 2013) . The basic idea which drives this is that "you will know the meaning of word by the company it keeps"!

If you want to know what the word means, look at the context….simple :)…..it gives the clue….well this is how I teach my son…simple to human….but not for a computer.

Word2vec, takes advantage of this by saying the "embedding" of the word is defined by the context it appears. So words appearing in the same context are related (i.e. equivalent), so will have vectors which are equivalent in a corpus.

For example (could you try filling in)

I went for a walk ______

Could the answer be "yesterday" or "outside"…but the main take away is the context which drives it here :)

Also could this as well solve the problem of guessing when to use singular/plural or present or past tense?

Step back, why vectors?

Ok, everything so far we talked about is around words, hence we want computers to extract words from a huge text blob. For a computer we need to represent this input data (huge text blob) with a numerical representation as we know computers work with numbers. So effectively word embedding converts words into vectors. So given a word such as "yesterday" this would be represented as 64 numbers. Effectively each word is reduced to a vector and for word embedding to work we need relative words to be close to each other for example "yesterday" and "today".

To take this to next level, if we take vector of man subtract that from the vector for woman and the vector for queen would it result in king?

MAN - WOMAN + QUEEN = KING

This is where word2vec comes in, as a form of word embedding. It’s essentially a neural network.

At a high level, you feed in a word which then produces a vector (word embedding) and the output is a context word.

For instance, if we feed in "walk" from our example above to word2vec the output we would expect is "yesterday" or "today" or "outside" using neural networks based on the corpus.

Effectively, Word2vec looks for the word embedding with similar value (i.e. similar vector values) to find the output contextual word.

The beauty of this is that, you needn't know anything about your text to find a result, so basically unsupervised.

Just to add, besides "Word2vec" there are other ways to generate word embedding and most based on "co-occurrence matrix". Ok this is where personally I needed to pay attention at school to understand linear algebra and matrices. Not going into it…duh!...Matrix decomposition at the heart, Singular Value Decomposition, Gradient Descent etc etc to get word vectors.

A word is represented as the row and the context is represented as column in matrix. In a recommended system it uses the same mathematical model. (ratings as row vs users as column in a matrix).

Let me hope this has helped someone out there as else it would turn out to be technical/mathematical article and I am not the right person to advocate it. Below are some links.

Word2vec paper: https://arxiv.org/abs/1301.3781

GloVe paper: https://nlp.stanford.edu/pubs/glove.pdf

GloVe webpage: https://nlp.stanford.edu/projects/glove/

要查看或添加评论，请登录

Vijay Ram的更多文章

My thoughts on achieving goals through building a successful team?

2019年6月6日

My thoughts on achieving goals through building a successful team?

Goals are always ambitious. Is your goal an unreasonable goal/something visionary/something that is ground breaking/or…

1 条评论
Magic Leap One - Creator Edition 2018

2018年11月26日

Magic Leap One - Creator Edition 2018

I had the opportunity to take an AR leap over the weekend when the testing "Magic Leap". This is my first impressions…

1 条评论
Google Pixel 3 review

2018年11月12日

Google Pixel 3 review

On the back of my last year's review on 2017 Pixel 2 phones, I was excited to get my hands on the new 2018 Pixel 3…

1 条评论
iPhone Xs review

2018年9月25日

iPhone Xs review

The world learnt “bokeh”… ˉ\_(ツ)_/ˉ The Tick-Tock release of the iPhone product cycles meant, the “S”(Second…
AR…ARKit…ARCore

2018年2月18日

AR…ARKit…ARCore

In my previous post I briefly mentioned about AI and how it influenced AR at Facebook; here I want to take the…
AI, Machine Learning, AR and Voice Assistant (OK Google)

2018年1月16日

AI, Machine Learning, AR and Voice Assistant (OK Google)

Have you ever wondered how emails get classified as SPAM ?hmmm…we kind of take it for granted these days but never…

1 条评论
The Best Smartphone - iPhone X vs Pixel 2

2018年1月3日

The Best Smartphone - iPhone X vs Pixel 2

One of the most important question you can ask yourself when you have come out of your holiday season break , would be,…

1 条评论

See all articles

Word2vec - Word embeddings used in NLP

Vijay Ram

Head of Technology | eCommerce & Product Specialist | Digital Transformation & Innovation

Vijay Ram的更多文章

社区洞察

其他会员也浏览了

Tips to becoming a world-class Prompt Engineer

Attention is All You Need #aipaper01

Fundamentals of BERT- Bidirectional Encoders Representations from Transformers, Part-2

New Normal 2.0: Lets 'Talk' GPT-3

Issue #205 - THE ML ENGINEER???

Sense and Sentimentality

A Mixture of Experts: A revolutionary technique to boost generative AI performance?

Transformers on Hugging Face: A Beginner's Guide

Vijay Ram的更多文章

My thoughts on achieving goals through building a successful team?

Magic Leap One - Creator Edition 2018

Google Pixel 3 review

iPhone Xs review

AR…ARKit…ARCore

AI, Machine Learning, AR and Voice Assistant (OK Google)

The Best Smartphone - iPhone X vs Pixel 2

社区洞察

其他会员也浏览了

Tips to becoming a world-class Prompt Engineer

Attention is All You Need #aipaper01

Fundamentals of BERT- Bidirectional Encoders Representations from Transformers, Part-2

New Normal 2.0: Lets 'Talk' GPT-3

Issue #205 - THE ML ENGINEER???

Sense and Sentimentality

A Mixture of Experts: A revolutionary technique to boost generative AI performance?

Transformers on Hugging Face: A Beginner's Guide