Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Transformers in NLP

15

Flashcards

0/15

Still learning
StarStarStarStar

Layer Normalization

StarStarStarStar

Layer Normalization is a technique to stabilize the output of a neural net, applied inside the Transformer model. It normalizes the outputs across the features instead of the batch, which is important for training deep architectures.

StarStarStarStar

Self-Attention

StarStarStarStar

Self-Attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. It allows the model to integrate information from different parts of the sequence.

StarStarStarStar

Scaled Dot-Product Attention

StarStarStarStar

This attention function computes the dot products of the query with all keys, divides each by dk\sqrt{d_k}, and applies a softmax function to obtain the weights on the values. It is used in the transformer model's attention mechanisms.

StarStarStarStar

Query, Key, and Value Vectors

StarStarStarStar

In the attention mechanism, the input is transformed into query, key, and value vectors using learned weights. These vectors are used in computing the attention scores, reflecting how much attention is paid to other tokens when producing a representation of the current token.

StarStarStarStar

Feed-Forward Neural Networks in Transformers

StarStarStarStar

The feed-forward neural networks in the Transformer model are fully connected layers applied to each position separately and identically. They consist of two linear transformations with a ReLU activation in between.

StarStarStarStar

Positional Encoding

StarStarStarStar

Positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks to provide a notion of token order, which the model would otherwise lack due to its non-recurrent nature.

StarStarStarStar

Transformer Model

StarStarStarStar

The Transformer model is an architecture for transforming one sequence into another one with the help of two parts (Encoder and Decoder) but without the need for a recurrent architecture. It uses self-attention mechanisms to weigh the importance of different parts of the input data.

StarStarStarStar

Attention Mechanism

StarStarStarStar

Attention mechanisms in NLP allow models to focus on specific parts of the input sequence when generating a particular part of the output, improving the ability to remember long-range dependencies. They are key to transformer architectures.

StarStarStarStar

Cross-Attention

StarStarStarStar

Cross-Attention is an attention mechanism where the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. It helps the decoder focus on relevant parts of the input sequence.

StarStarStarStar

Transformer Fine-Tuning

StarStarStarStar

Transformer fine-tuning involves adjusting a pre-trained Transformer model on a specific task by training it further on a smaller dataset relevant to the task. This is crucial for adapting the general capabilities of Transformer models to more specialized tasks.

StarStarStarStar

Masked Self-Attention

StarStarStarStar

Masked self-attention in the Transformer's decoder ensures that the prediction for a certain position does not depend on the future tokens in the sequence. A mask is applied to the input of the softmax step of the self-attention to prevent future positions from being accessed.

StarStarStarStar

Multi-Head Attention

StarStarStarStar

Multi-head attention consists of several attention layers running in parallel. This enables the model to jointly attend to information from different representation subspaces at different positions, capturing various aspects of information.

StarStarStarStar

Transformer-Based Models

StarStarStarStar

Transformer-based models such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) have extended the Transformer architecture to create models that set new standards in NLP tasks like translation, summarization, and question answering.

StarStarStarStar

Encoder

StarStarStarStar

The encoder's role in the Transformer is to process the input sequence and map it into an abstract continuous representation that holds the semantic meaning of the input, which the decoder can then use.

StarStarStarStar

Decoder

StarStarStarStar

The decoder in the Transformer model takes the continuous representation from the encoder and generates the output sequence. It also features a stack of identical layers, but with an additional third sub-layer that performs multi-head attention over the encoder's output.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.