课程: TensorFlow: Working with NLP

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Multi-head attention and feedforward network

Multi-head attention and feedforward network

课程: TensorFlow: Working with NLP

Multi-head attention and feedforward network

- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word, but what if we could get multiple instances of the self-attention mechanism so that each can perform a different task? One could make a link between nouns and adjectives, another could connect up pronouns to their subjects. This is called multi-headed attention, and BERT has 12 such heads. Each multi-head attention block gets three inputs, the query, the key, and the value. These are then put through linear or dense layers before the multi-head attention function. The query key and value are then passed through separate, fully-connected linear layers for each attention head. This model can jointly attend to information from different representations and at different positions, allowing it to make richer connections between words.

内容