Explore tens of thousands of sets crafted by our community.
NLP Basics
25
Flashcards
0/25
NLP
Natural Language Processing, the branch of computer science and artificial intelligence concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
Tokenization
The process of breaking down text into units of meaning, known as tokens, which can be words, phrases, or symbols.
Stemming
The process of reducing words to their root form or stem, often by removing common prefixes or suffixes.
Lemmatization
Similar to stemming, it's the process of reducing a word to its base or dictionary form, called a lemma, often involving a vocabulary and morphological analysis of the word.
Part-of-Speech Tagging
The process of marking up words in a text as corresponding to a particular part of speech, based on both its definition and context.
Named Entity Recognition (NER)
The process of identifying and classifying named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Syntax
The arrangement of words and phrases to create well-formed sentences in a language.
Parsing
The process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar.
Semantics
The study of meaning in language, as opposed to syntax which is the study of sentence structure.
Sentiment Analysis
The process of computationally determining whether a piece of writing is positive, negative, or neutral.
Machine Translation
The application of computers to the task of translating texts from one natural language to another.
Corpus
A large and structured set of texts, typically used for linguistic research and language modeling.
Stop Words
Words that are filtered out before processing of natural language data because they are deemed irrelevant for the task and do not contain important meaning.
Word Embeddings
A type of word representation that allows words with similar meaning to have a similar representation.
Chatbot
A software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.
Language Model
A model that predicts the likelihood of a sequence of words.
TF-IDF
Term Frequency-Inverse Document Frequency, a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
Token
In NLP, a token usually corresponds to a word or a punctuation marker.
BiLSTM
Bidirectional Long Short-Term Memory, a type of RNN that can improve model performance on sequence classification problems.
Attention Mechanism
A component of neural networks that allows the model to focus on different parts of the input sequence when generating an output, improving the performance of the network on tasks like machine translation.
BERT
Bidirectional Encoder Representations from Transformers, a method of pre-training language representations which can be fine-tuned for various NLP tasks.
Transformer
A type of deep learning model that uses self-attention to boost the speed of training and improve the handling of long-range dependencies in data.
Named Entity
A word or phrase that clearly identifies one item from a set of others that have similar attributes, such as names of people, places, companies, etc.
GPT
Generative Pre-trained Transformer, a type of language model that uses unsupervised learning to generate natural language texts.
Co-reference Resolution
The task of determining whether and how terms in a text, such as pronouns and nouns, refer to the same entity.
© Hypatia.Tech. 2024 All rights reserved.