Explore tens of thousands of sets crafted by our community.
NLP Basics
25
Flashcards
0/25
NLP
Natural Language Processing, the branch of computer science and artificial intelligence concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
Sentiment Analysis
The process of computationally determining whether a piece of writing is positive, negative, or neutral.
Tokenization
The process of breaking down text into units of meaning, known as tokens, which can be words, phrases, or symbols.
TF-IDF
Term Frequency-Inverse Document Frequency, a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
Attention Mechanism
A component of neural networks that allows the model to focus on different parts of the input sequence when generating an output, improving the performance of the network on tasks like machine translation.
Word Embeddings
A type of word representation that allows words with similar meaning to have a similar representation.
Corpus
A large and structured set of texts, typically used for linguistic research and language modeling.
Transformer
A type of deep learning model that uses self-attention to boost the speed of training and improve the handling of long-range dependencies in data.
Co-reference Resolution
The task of determining whether and how terms in a text, such as pronouns and nouns, refer to the same entity.
Parsing
The process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar.
Part-of-Speech Tagging
The process of marking up words in a text as corresponding to a particular part of speech, based on both its definition and context.
Syntax
The arrangement of words and phrases to create well-formed sentences in a language.
Named Entity Recognition (NER)
The process of identifying and classifying named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Language Model
A model that predicts the likelihood of a sequence of words.
Token
In NLP, a token usually corresponds to a word or a punctuation marker.
Named Entity
A word or phrase that clearly identifies one item from a set of others that have similar attributes, such as names of people, places, companies, etc.
Stemming
The process of reducing words to their root form or stem, often by removing common prefixes or suffixes.
BiLSTM
Bidirectional Long Short-Term Memory, a type of RNN that can improve model performance on sequence classification problems.
Stop Words
Words that are filtered out before processing of natural language data because they are deemed irrelevant for the task and do not contain important meaning.
BERT
Bidirectional Encoder Representations from Transformers, a method of pre-training language representations which can be fine-tuned for various NLP tasks.
Lemmatization
Similar to stemming, it's the process of reducing a word to its base or dictionary form, called a lemma, often involving a vocabulary and morphological analysis of the word.
Machine Translation
The application of computers to the task of translating texts from one natural language to another.
GPT
Generative Pre-trained Transformer, a type of language model that uses unsupervised learning to generate natural language texts.
Semantics
The study of meaning in language, as opposed to syntax which is the study of sentence structure.
Chatbot
A software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.
© Hypatia.Tech. 2024 All rights reserved.