Word Embedding and Word vectors - MathX explained

Word Embedding and Word vectors - MathX explained

MathX publish this series to discuss the Mathematics fundamentals behind interesting concepts in the field of Artificial Intelligence, Machine Learning and other topics. Please?subscribe?to the newsletter to never miss an article.

To understand what are Word Embeddings, let’s begin with a simple problem statement.

Consider the following two sentences:

  1. The cat chased the mouse.
  2. The dog pursued the rat.
  3. He plays guitar very well.

Let’s ask few questions to ourselves.

  • Are the above 3 sentences identical?
  • Do the sentences convey similar meaning or different?

While these sentences are not identical, the first two convey a similar meaning but quite different from the third. As human beings we didn’t find much trouble to come up with this, however, question arises how to do this on a computer.

Word embeddings can be used to represent these sentences as vectors (a list of numbers) in a high-dimensional space, where the?distance between the vectors?corresponds to the degree of similarity or differences between the sentences. By comparing these vectors, we can determine that whether these sentences are indeed semantically similar or different.

Now, let’s see how.

To understand the next part, let’s ask few questions to ourselves.

  1. How can I convert the above sentence into a number or a vector (array of numbers)?
  2. Or more fundamentally how to represent a word into a number or a vector?
  3. Is the representation useful in some context?
  4. How many numbers should be used to represent a word?

If we can answer the above two questions, that will be the basis for Word embeddings.

Let’s continue to understand it. Say I had these 5 words in my repository.

“I am a Data Scientist”.

If you have to represent these sentence word by word in number, a common way is apply one-hot encoding.

Figure 1:

No alt text provided for this image

From above, now each word can be represented with a vector.

I = [1,0,0,0,0] = v1

am = [0,1,0,0,0] = v2

a = [0,0,1,0,0] = v3

Data = [0,0,0,1,0] = v4

Scientist = [0,0,0,0,1] = v5

A 5 word sentence yields a 5x5 matrix.

To implement it is very easy. However it loses the inner meaning of the word in a sentence. The relationship among the words is also lost. Thus it loses the context or meaning of the sentence. Also another problem is as the number of words increases the matrix keeps growing by :

No alt text provided for this image

where n is the number of words in the repository. If we pass this to a neural network, there will be challenges like vanishing gradient and sparse matrix where most elements are zero (as you can see in the above matrix).

Because of this one-hot-encoding is not widely used in many natural language processing applications.

So what do we do now?

You can continue to read the complete article here https://mathx.substack.com/p/word-embedding-and-word-vectors-mathx.


Bikash Debnath ?的更多文章

