deep-learning - Transformer - Attention is all you need - encoder decoder cross attention

質問する 2019-02-04T04:50:42.880

280 次

It is my understanding that each encoder block takes the output from the previous encoder, and that the output is the attended representation (Z) of the sequence (aka sentence). My question is, how does the last encoder block produce K, V from Z (to be used in encoder-decode attention aublayer of the decoder)

are we simply taking Wk and Wv from last encoder layer?

http://jalammar.github.io/illustrated-transformer/

deep-learning - Transformer - Attention is all you need - encoder decoder cross attention

1 に答える 1

Related

Reference