Explore tens of thousands of sets crafted by our community.
Data Mining Concepts
15
Flashcards
0/15
Dimensionality Reduction
The process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Clustering
A data mining technique that involves grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Support Vector Machine (SVM)
A supervised learning model that analyzes data used for classification and regression analysis. It finds the hyperplane that best divides a dataset into classes.
Ensemble Methods
Techniques that create multiple models and then combine them to produce improved results. Examples include Random Forests and Boosted Trees.
Data Mining
The process of discovering patterns and knowledge from large amounts of data. The data is often unstructured.
Association Rule Learning
A rule-based machine learning method for discovering interesting relations between variables in large databases. A typical example is market basket analysis.
Anomaly Detection
The identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.
Neural Networks
A set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.
Data Preprocessing
The process of transforming raw data into an understandable format. It's a data mining technique that involves transforming raw data into an understandable format.
Cross-Validation
A model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
Regression
A type of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (predictor).
Decision Tree
A decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Outlier
An outlier is a data point in a data set that is distant from all other observations. A data point that lies outside the overall distribution of the dataset.
Classification
The process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
K-Means Clustering
A type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal is to find groups in the data, with the number of groups represented by the variable K.
© Hypatia.Tech. 2024 All rights reserved.