Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Natural Language Processing

NLP Basics

Flashcards

0/25

Still learning

NLP

Natural Language Processing, the branch of computer science and artificial intelligence concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.

Tokenization

The process of breaking down text into units of meaning, known as tokens, which can be words, phrases, or symbols.

Stemming

The process of reducing words to their root form or stem, often by removing common prefixes or suffixes.

Lemmatization

Similar to stemming, it's the process of reducing a word to its base or dictionary form, called a lemma, often involving a vocabulary and morphological analysis of the word.

Part-of-Speech Tagging

The process of marking up words in a text as corresponding to a particular part of speech, based on both its definition and context.

Named Entity Recognition (NER)

The process of identifying and classifying named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Syntax

The arrangement of words and phrases to create well-formed sentences in a language.

Parsing

The process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar.

Semantics

The study of meaning in language, as opposed to syntax which is the study of sentence structure.

Sentiment Analysis

The process of computationally determining whether a piece of writing is positive, negative, or neutral.

Machine Translation

The application of computers to the task of translating texts from one natural language to another.

Corpus

A large and structured set of texts, typically used for linguistic research and language modeling.

Stop Words

Words that are filtered out before processing of natural language data because they are deemed irrelevant for the task and do not contain important meaning.

Word Embeddings

A type of word representation that allows words with similar meaning to have a similar representation.

Chatbot

A software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.

Language Model

A model that predicts the likelihood of a sequence of words.

TF-IDF

Term Frequency-Inverse Document Frequency, a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

Token

In NLP, a token usually corresponds to a word or a punctuation marker.

BiLSTM

Bidirectional Long Short-Term Memory, a type of RNN that can improve model performance on sequence classification problems.

Attention Mechanism

A component of neural networks that allows the model to focus on different parts of the input sequence when generating an output, improving the performance of the network on tasks like machine translation.

BERT

Bidirectional Encoder Representations from Transformers, a method of pre-training language representations which can be fine-tuned for various NLP tasks.

Transformer

A type of deep learning model that uses self-attention to boost the speed of training and improve the handling of long-range dependencies in data.

Named Entity

A word or phrase that clearly identifies one item from a set of others that have similar attributes, such as names of people, places, companies, etc.

GPT

Generative Pre-trained Transformer, a type of language model that uses unsupervised learning to generate natural language texts.

Co-reference Resolution

The task of determining whether and how terms in a text, such as pronouns and nouns, refer to the same entity.

Know

Still learning

Click to flip

Know