Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Data Preprocessing Techniques

12

Flashcards

0/12

Still learning
StarStarStarStar

Feature Extraction

StarStarStarStar

Feature Extraction involves transforming raw data into a set of features that can be effectively used by a learning algorithm. Examples include PCA for dimensionality reduction or extracting words from text.

StarStarStarStar

Dimensionality Reduction

StarStarStarStar

Dimensionality Reduction techniques reduce the number of variables under consideration and can be divided into feature selection and feature extraction. Common methods are PCA and t-SNE, which help in improving model performance and computational efficiency.

StarStarStarStar

Data Transformation

StarStarStarStar

Data Transformation involves converting data into a suitable format or structure for analysis. This can include aggregating data, combining features, or transforming variable types.

StarStarStarStar

Noise Filtering

StarStarStarStar

Noise Filtering is the process of removing irregularities and fluctuations in data that may hinder the data analysis process. Methods include smoothing techniques like moving averages or more complex signal processing filters.

StarStarStarStar

Normalization

StarStarStarStar

Normalization adjusts the scale of features in the data to a standard range, often between 0 and 1, to allow for fair comparison between different scales. Used to prepare data for algorithms that assume data is normally distributed.

StarStarStarStar

Discretization

StarStarStarStar

Discretization converts continuous features into discrete values by creating a set of intervals or 'bins' to categorize or group the continuous values. Common in preparing data for categorical-based algorithms.

StarStarStarStar

Missing Value Imputation

StarStarStarStar

Missing Value Imputation is the process of replacing missing data with substituted values. Techniques include using the mean, median, mode, using algorithms like k-NN, or predictive modeling to estimate missing values.

StarStarStarStar

Standardization

StarStarStarStar

Standardization rescales data to have a mean of 0 and a standard deviation of 1, also known as Z-score normalization. Useful for methods that assume features are centered around zero and with equal variance.

StarStarStarStar

Outlier Detection

StarStarStarStar

Outlier Detection identifies data points that are significantly different from the majority of the data, potentially indicating variability in measurement or experimental errors. Techniques range from IQR to clustering-based methods.

StarStarStarStar

Binarization

StarStarStarStar

Binarization is the process of turning data features into binary (0/1) values based on a threshold. Commonly used when we need to transform probabilistic values into a crisp, boolean decision.

StarStarStarStar

Data Cleaning

StarStarStarStar

Data Cleaning involves removing errors and inconsistencies in data to improve its quality, often by dealing with missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies.

StarStarStarStar

Feature Encoding

StarStarStarStar

Feature Encoding is transforming categorical variables into numerical format. Common techniques include one-hot encoding and label encoding, facilitating the use of mathematical models that require numerical input.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.