Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Machine Learning

Activation Functions in Neural Networks

Bias-Variance Tradeoff

Computer Vision Fundamental Concepts

Cross-validation Techniques

Data Preprocessing for Machine Learning

Deep Learning Fundamentals

Data Imbalance Handling

Dimensionality Reduction Methods

Ensemble Learning Techniques

Evaluation Metrics for Classification

Feature Engineering Techniques

Generative Models

Gradient Descent Variants

Hyperparameter Tuning

Machine Learning Best Practices

Machine Learning Security and Privacy

Loss Functions Explained

Machine Learning Pipelines

Machine Learning Frameworks

Model Explainability and Interpretability

Natural Language Processing Terminology

Multiclass Classification Strategies

Model Selection Criteria

Neural Network Architectures

Performance Optimization in Machine Learning

Time Series Analysis Techniques

Reinforcement Learning Concepts

Statistical Tests for Model Comparison

Data Preprocessing for Machine Learning

Flashcards

0/15

Still learning

Binning

Binning converts numerical variables into categorical counterparts by grouping them into bins. For example, age can be transformed into categories such as 'child', 'adult', and 'elderly'.

Standardization

Standardization rescales data to have a mean (\(\mu\)) of 0 and standard deviation (\(\sigma\)) of 1 (unit variance). Example: Z-score normalization where an entry \(x\) is replaced by \((x - \mu) / \sigma\).

Label Discretization

Label discretization converts continuous labels into discrete classes. Techniques like binning can be used to divide the range of continuous values into intervals, turning a regression problem into a classification one.

Dimensionality Reduction

Dimensionality reduction reduces the number of input variables in a dataset. Principal Component Analysis (PCA) is an example, which is used to reduce the dimensionality of large datasets by transforming to a new set of variables.

Outlier Detection

Outlier detection identifies and removes outliers from datasets to prevent them from skewing the analysis. Example: Z-score and IQR (interquartile range) are common methods for finding outliers.

Feature Engineering

Feature engineering creates new features from existing ones to improve model performance. Examples include creating polynomial features from existing attributes.

Train/Test Split

The train/test split divides the dataset into two parts: one for training the model and the other for testing its performance. A common split is 80% for training and 20% for testing.

Missing Values Imputation

Imputation fills in missing or null values within a dataset. One common technique is to replace missing values with the mean, median, or mode value of the column.

Feature Encoding

Feature encoding transforms categorical variables into numerical values which can be used in mathematical models. For example, one-hot encoding represents categorical variables as binary vectors.

Handling Imbalanced Data

Handling imbalanced data involves adjusting the dataset so that the classes are more balanced. Techniques such as SMOTE (Synthetic Minority Over-sampling Technique) can be used.

Normalization

Normalization scales individual samples to have unit norm. It is often used when applying machine learning algorithms. Example usage includes preparing data for algorithms that assume data to be normally distributed, e.g., Gaussian naive Bayes.

Data Transformation

Data transformation includes applying mathematical functions to data. For example, log transformation can be used to reduce the skewness of positively skewed data.

Feature Scaling

Feature scaling brings all numerical features to the same scale. Min-max scaling is an example where data is scaled to a fixed range, usually 0 to 1.

Text Data Processing

Text data is preprocessed by techniques such as tokenization, stemming, and lemmatization. Methods like TF-IDF are used to convert text into a numerical format suitable for machine learning models.

Handling Time Series Data

Time series data requires special preprocessing like windowing, lag variables, or detrending to prepare for machine learning algorithms.

Know

Still learning

Click to flip

Know