Math Problem Statement

you are given the following input for transformer encoder: {flying, arrows}. The input embeddings for these two words are [0,1,1,1,1,0] and [1,1,0,-1,-1,1]. you are trying to represent the first word 'flying' with the help of self-attention in the first encoder. For the first attention head, the query, key and value matrices just take the 2 dimensions from the input each. so, first 2 dimensions define the query vector, and so on. What will be the self-attention output for the word 'flying' corresponding to this attention head. You are using the scaled dot vector.

Solution

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Linear Algebra
Vector Dot Product
Softmax Function

Formulas

Dot product: q · k^T
Scaled attention: score = (q · k^T) / sqrt(d_k)
Softmax: softmax(x_i) = e^(x_i) / sum(e^(x_j))

Theorems

Dot product theorem
Softmax transformation in attention

Suitable Grade Level

Graduate-level (Machine Learning, Deep Learning)