Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Natural Language Processing

Common NLP Tasks

Flashcards

0/15

Still learning

Tokenization

Tokenization is the process of splitting a piece of text into individual tokens, which are usually words or phrases. Its purpose is to structure text in a way that is analyzable by algorithms.

Lemmatization

Lemmatization is the process of reducing words to their base or dictionary form, called lemma. It helps in standardizing words to their root form, making it easier to analyze textual data.

Stemming

Stemming is the process of reducing words to their stem, base, or root form, often using heuristic methods. Its purpose is to group similar words together to aid in text processing.

Part-of-Speech Tagging

Part-of-Speech Tagging is the process of identifying each word's part of speech in a text, such as nouns, verbs, adjectives, etc. It helps in understanding the syntactic structure of sentences.

Named Entity Recognition (NER)

Named Entity Recognition is the task of identifying and classifying named entities in text into predefined categories like names of persons, organizations, locations, etc. It's crucial for extracting information and for tasks such as question answering.

Sentiment Analysis

Sentiment Analysis is the process of determining the emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions and emotions expressed in online mention.

Topic Modeling

Topic Modeling is a type of statistical model used to discover the abstract 'topics' that occur in a collection of documents. It helps in organizing and understanding large volumes of textual information.

Machine Translation

Machine Translation refers to the use of software to translate text from one language to another. It is fundamental for communication between different language speakers and in making information globally accessible.

Speech Recognition

Speech Recognition is the process of converting spoken words into text. This technology enables computer systems to understand and process human speech, and is used in voice user interfaces.

Text Classification

Text Classification involves categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.

Language Modeling

Language Modeling is the task of predicting the next word or sequence of words in a sentence. It is the foundation for many NLP tasks like speech recognition, text generation, and is crucial for understanding language structure and context.

Automatic Summarization

Automatic Summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content.

Coreference Resolution

Coreference Resolution is the task of finding all expressions that refer to the same entity in a text. It's a key aspect of understanding who or what is being talked about in a piece of writing.

Question Answering

Question Answering is a computer science discipline within the fields of information retrieval and natural language processing which is concerned with building systems that automatically answer questions posed by humans in a natural language.

Relation Extraction

Relation Extraction involves identifying and classifying semantic relationships from a text. This task helps in automatically extracting structured information from unstructured text and plays a significant role in knowledge graph creation and information retrieval.

Know

Still learning

Click to flip

Know