课程: TensorFlow: Working with NLP
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
Multi-head attention and feedforward network
- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word, but what if we could get multiple instances of the self-attention mechanism so that each can perform a different task? One could make a link between nouns and adjectives, another could connect up pronouns to their subjects. This is called multi-headed attention, and BERT has 12 such heads. Each multi-head attention block gets three inputs, the query, the key, and the value. These are then put through linear or dense layers before the multi-head attention function. The query key and value are then passed through separate, fully-connected linear layers for each attention head. This model can jointly attend to information from different representations and at different positions, allowing it to make richer connections between words.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。