Home /
Expert Answers /
Computer Science /
solve-all-parts-of-this-question-2-transformers-self-attention-20-points-in-this-question-we-wi-pa694
(Solved):
Solve all parts of this question
2 Transformers-Self Attention [20 Points] In this question we wi ...
Solve all parts of this question
2 Transformers-Self Attention [20 Points] In this question we will compute the transformer self-attention covered in the lecture. You will be required to compute the value of different matrices as mentioned below and in the lecture notes. For all questions in this section, unless otherwise stated work must be shown in the form of matrix multiplications to receive full credit (i.e. C=ABT ). For performing the computations, using Numpy, Excel 2 Table 1: Word Embeddings or other software is recommended to avoid computation errors. When writing your answers please round to 2 decimal places. You may use scientific notation to represent your answers if necessary. Wq=[1?2?11?],Wk=[?1?2?1?1?],Wv=[11?0?2?] 2.1 In this question we will consider a single attention head. Given the set of word embeddings (Table 1), projection matrices ( Wq,Wk,Wv ), and a normalization factor of 3 instead of dk?, fill out this table with the normalized query-key score for each possible pair of words. Hint: Compute the value of dk?QKT? 2.2 Given the word embeddings, the previously calculated query-key values, and the value projection matrix, calculate the output embeddings of this attention head. Output embedding is computed using the Attention formula discussed in the lecture slides. Fill in the table with your results.