Multi head attention
continuiti.networks.multi_head_attention
Multi-Head-Attention in continuiti.
MultiHeadAttention(hidden_dim, n_heads, attention=None, dropout_p=0, bias=True)
¶
Bases: Attention
Multi-Head Attention module.
Module as described in the paper Attention is All you Need with optional bias for the projections. This implementation allows to use attention implementations other than the standard scaled dot product attention implemented by the MultiheadAttention PyTorch module.
where
PARAMETER | DESCRIPTION |
---|---|
hidden_dim |
dimension of the hidden layers (embedding dimension).
TYPE:
|
n_heads |
number of attention heads.
TYPE:
|
attention |
implementation of attention (defaults to scaled dot product attention). Needs to have the arguments
TYPE:
|
dropout_p |
dropout probability.
TYPE:
|
bias |
If True, then the projection onto the different heads is performed with bias.
TYPE:
|
Source code in src/continuiti/networks/multi_head_attention.py
forward(query, key, value, attn_mask=None)
¶
Compute the attention scores.
PARAMETER | DESCRIPTION |
---|---|
query |
Query tensor of shape (batch_size, target_sequence_length, hidden_dim).
TYPE:
|
key |
Key tensor of shape (batch_size, source_sequence_length, hidden_dim).
TYPE:
|
value |
Value tensor of shape (batch_size, source_sequence_length, hidden_dim).
TYPE:
|
attn_mask |
Attention mask of shape (batch_size, target_sequence_length, source_sequence_length).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
Attention scores of shape (batch_size, target_sequence_length, hidden_dim). |
Source code in src/continuiti/networks/multi_head_attention.py
Created: 2024-08-20