Explore tens of thousands of sets crafted by our community.
Data Mining Concepts
15
Flashcards
0/15
Data Mining
The process of discovering patterns and knowledge from large amounts of data. The data is often unstructured.
Clustering
A data mining technique that involves grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Classification
The process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
Association Rule Learning
A rule-based machine learning method for discovering interesting relations between variables in large databases. A typical example is market basket analysis.
Anomaly Detection
The identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.
Regression
A type of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (predictor).
Decision Tree
A decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Support Vector Machine (SVM)
A supervised learning model that analyzes data used for classification and regression analysis. It finds the hyperplane that best divides a dataset into classes.
Neural Networks
A set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.
Dimensionality Reduction
The process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Ensemble Methods
Techniques that create multiple models and then combine them to produce improved results. Examples include Random Forests and Boosted Trees.
Data Preprocessing
The process of transforming raw data into an understandable format. It's a data mining technique that involves transforming raw data into an understandable format.
K-Means Clustering
A type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal is to find groups in the data, with the number of groups represented by the variable K.
Outlier
An outlier is a data point in a data set that is distant from all other observations. A data point that lies outside the overall distribution of the dataset.
Cross-Validation
A model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
© Hypatia.Tech. 2024 All rights reserved.