Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Data Mining

Association Rule Learning

Anomaly Detection Methods

Big Data Technologies

Clustering Algorithms

Classification Algorithms

Data Dimensionality Reduction Techniques

Data Mining Project Lifecycle

Data Mining Success Metrics

Data Mining Techniques

Data Mining and Customer Relationship Management (CRM)

Data Mining for Fraud Detection

Data Mining in Finance

Data Quality Issues

Data Mining in Retail

Data Mining in Healthcare

Data Mining Research Topics

Data Mining Tools and Software

Data Mining Skill Set

Data Mining Case Studies

Data Preprocessing Techniques

Data Mining Challenges

Ensuring Data Privacy in Data Mining

Ethical Aspects of Data Mining

Frequent Pattern Mining Algorithms

Machine Learning vs Data Mining

Mining Complex Data Types

Multimedia Data Mining

Predictive Analytics Fundamentals

Time Series Analysis in Data Mining

Web Mining Methods

Social Media Data Mining

Spatial Data Mining

Text Mining Techniques

Flashcards

0/10

Still learning

Part-of-Speech Tagging

Part-of-Speech Tagging assigns parts of speech to each word (and other tokens) in the text, such as nouns, verbs, adjectives, etc. This process is used in text mining for a deeper linguistic analysis and aids in understanding sentence structure.

Lemmatization

Lemmatization is similar to stemming but involves reducing words to their lexicographical root or lemma. Unlike stemming, lemmatization considers the context and yields verbs in their infinitive form and nouns in their singular form, thus enhancing the quality of text mining.

Text Classification

Text Classification involves assigning pre-defined labels to texts. By using techniques such as supervised machine learning, text mining can sort documents into categories for easier retrieval and analysis.

Stemming

Stemming reduces words to their root or stem form. This normalization technique simplifies texts and consolidates different forms of a word, aiding in text mining by reducing the vocabulary and improving the matching of terms.

Tokenization

Tokenization is the process of breaking text into individual terms or tokens. It is applied in text mining for preliminary text analysis and parsing to facilitate further text manipulation and processing.

Term Frequency-Inverse Document Frequency (TF-IDF)

TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. It is used in text mining to weight the importance of terms based on their occurrence patterns.

Clustering

Clustering groups similar text documents together without predefined categories. Applied in text mining, this technique is useful for discovering natural groupings and patterns within large sets of unlabelled text data.

Stop Word Removal

Stop Word Removal involves eliminating common words that may not contribute to the meaning of text for analysis, such as 'the', 'is', and 'at'. It enhances the focus on relevant text features within text mining.

Named Entity Recognition (NER)

NER seeks to identify and classify named entities within text to predefined categories such as names of people, organizations, locations, etc. In text mining, it is used for information extraction and to enhance the understanding of texts.

Sentiment Analysis

Sentiment Analysis determines the emotional tone behind a body of text. In text mining, it is used to discern subjective information and to understand the sentiments expressed in social media, reviews, and more.

Know

Still learning

Click to flip

Know