Explore tens of thousands of sets crafted by our community.
Natural Language Processing Terminology
20
Flashcards
0/20
Named Entity Recognition (NER)
The process of identifying and classifying named entities in text into predefined categories such as the names of persons, organizations, locations, etc.
Sequence to Sequence (Seq2Seq) Model
A type of model in NLP that transforms a given sequence of elements, such as words in one language, to another sequence, such as a translation in another language.
Transformer
An NLP model architecture that relies on self-attention mechanisms instead of sequence-aligned recurrent layers like RNNs for handling sequences of data.
Corpus
A large and structured set of texts used in NLP. It serves as data for modeling language and extracting statistical information.
Attention Mechanism
A technique in sequence learning that allows the model to focus on different parts of the input while generating each word of the output sequence.
Latent Semantic Analysis (LSA)
A technique for analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.
POS Tagging
Short for Part-of-Speech Tagging, it involves assigning word types (like noun, verb, adjective) to each token. Useful for syntactic parsing and word sense disambiguation.
Word Embeddings
Numerical vector representations of words that capture their meanings, syntactic properties, and relation with other words.
Bag of Words (BoW)
A simple feature extraction technique that describes the occurrence of words within a document, disregarding grammar and word order.
BERT
Bidirectional Encoder Representations from Transformers, a pre-training technique for NLP that considers both left and right context in all layers.
Lemmatization
The process of reducing words to their lemma or dictionary form. It is more sophisticated than stemming and uses lexical knowledge bases like WordNet.
n-gram
A contiguous sequence of n items (typically words or letters) from a given text or speech. Useful in text modeling and prediction.
Language Model
A statistical model that predicts the likelihood of a sequence of words. Used in various applications such as speech recognition, typing suggestions, and machine translation.
Co-reference Resolution
Determining if and how words or phrases (like pronouns) refer to the same entity within a text. It is crucial for understanding the context and meaning.
Stop Words
Commonly used words in a language (like 'the', 'is', 'in') that are often removed in NLP tasks to reduce noise and focus on meaningful words.
Stemming
The process of reducing words to their base or root form. Generally simpler than lemmatization and often uses heuristic rules.
TF-IDF
Short for Term Frequency-Inverse Document Frequency. A statistical measure used to evaluate the importance of a word to a document in a corpus.
Dependency Parsing
The process of analyzing the grammatical structure of a sentence by establishing relationships based on a head-dependent grammar.
Chunking
Also known as shallow parsing, it's the process of dividing a text into syntactically correlated parts of words, like noun or verb phrases.
Tokenization
The process of splitting text into individual terms or phrases. It allows NLP systems to work with words and concepts rather than strings of characters.
© Hypatia.Tech. 2024 All rights reserved.