Self/cross hard/soft attention

Self/cross hard/soft attention

?? NEW LECTURE ??

“Set to set” and “set to vector” mappings using self/cross hard/soft attention. We combined a (two) attention module(s) with a k=1 1D convolution to get a transformer encoder (decoder).

Slides: https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/10%20-%20Attention%20%26%20transformer.pdf

Notebook: https://github.com/Atcold/pytorch-Deep-Learning/blob/master/15-transformer.ipynb

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


This week's slides were quite dense, but we've been building up momentum since the beginning of class, 3 months ago.

We recalled concepts from:

? Linear Algebra (Ax as lin. comb. of A's columns weighted by x's components, or scalar products or A's rows against x)

? Recurrent Nets (stacking x[t] with h[t–1] and concatenating W_x and W_h)

? Autoencoders (encoder-decoder architecture)

? k=1 1D convolutions (that does not assume correlation between neighbouring features and act as a dim. adapter)

and put in practice with PyTorch.

Notice how you can smoothly go from hard to soft attention by switching between argmax and softargmax (which most of you still call “softmax”). Hard attention implies a one-hot vector a, while soft attention gives your pseudo probabilities.

Once again, this architecture deals with *sets* of symbols!

There is no order. Therefore, computations can be massively parallelised (they are just a bunch of matrix products, afterall).

Just be aware of that t × t A matrix that could blow up, if t (your set length) is large.

Just a final recap, there is *one* and *only one* _query_ (I'd like to cook a lasagna) that I'm going to check against *all* _keys_ (recipes titles) in order to retrieve *one* (if hard) or *a mixed* (if soft) _value_ (recipe to prepare my dinner with).

Me, hungry, during class = decoder.

My granny, knowing all recipes names (keys) and preparations (values) = encoder.

Me, figuring out what I want = self-attention.

Me, asking granny = cross-attention.

Dinner = yay!

I'm done.


Next week: Graph Neural Nets (if it's taking me less than a week to learn about them).


https://twitter.com/alfcnz/status/1252802274080022528

要查看或添加评论,请登录

Alfredo Canziani的更多文章

  • Training for latent variable energy based models

    Training for latent variable energy based models

    This week we went through the second part of my lecture on latent variable ?? energy ?? based models. ?? We've warmed…

  • Inference for latent variable energy based models

    Inference for latent variable energy based models

    This week we've learnt how to perform inference with a latent variable ?? energy ?? based model. ?? These models are…

  • Graph Convolutional Networks (GCN)

    Graph Convolutional Networks (GCN)

    ?? NEW LECTURE ?? Graph Convolutional Networks… from attention! In attention ?? is computed with a [soft]argmax over…