Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Machine Learning

Activation Functions in Neural Networks

Bias-Variance Tradeoff

Computer Vision Fundamental Concepts

Cross-validation Techniques

Data Preprocessing for Machine Learning

Deep Learning Fundamentals

Data Imbalance Handling

Dimensionality Reduction Methods

Ensemble Learning Techniques

Evaluation Metrics for Classification

Feature Engineering Techniques

Generative Models

Gradient Descent Variants

Hyperparameter Tuning

Machine Learning Best Practices

Loss Functions Explained

Machine Learning Security and Privacy

Machine Learning Pipelines

Machine Learning Frameworks

Model Explainability and Interpretability

Multiclass Classification Strategies

Natural Language Processing Terminology

Model Selection Criteria

Neural Network Architectures

Performance Optimization in Machine Learning

Time Series Analysis Techniques

Reinforcement Learning Concepts

Statistical Tests for Model Comparison

Feature Engineering Techniques

Flashcards

0/15

Still learning

Discretization

The process of converting continuous data into discrete buckets or intervals, similar to binning, but with an emphasis on finding optimal bin boundaries. Use it to improve performance of classification tasks.

Standardization

Rescales data to have a mean ( $\mu$ ) of 0 and standard deviation ( $\sigma$ ) of 1 (unit variance). Commonly used in techniques that assume data is normally distributed.

x_{standardized} = \frac{x - \mu}{\sigma}

One-Hot Encoding

Transforms categorical variables into a form that could be provided to ML algorithms to do a better job in prediction. Each unique category value is converted into a new binary column. Use it when the categorical feature is not ordinal.

Feature Selection

The process of reducing the number of input variables by selecting the most important features that contribute to the predictive power of the model. Use it to reduce overfitting and improve model performance.

Polynomial Features

Generates new feature interaction terms by raising existing features to a power or creating interaction between them. Use it to capture non-linear relationships when linear models are used.

Feature Scaling

Bringing different range features to a comparable scale. Use it before applying algorithms that are sensitive to the scale of data like SVM, k-NN, and gradient descent.

Feature Extraction

Transforming the input data into a set of new features by dimensionality reduction techniques such as PCA. Use it to reduce the number of features in high-dimensional data.

Binning

Converts numeric variables into categorical counterparts by grouping them into bins. Use it to reduce the effects of minor observation errors and simplify the model.

Category Encoding

General term for converting categorical variables into numeric format. Includes techniques such as one-hot encoding, label encoding, and binary encoding. Choose an encoding method based on model requirements and feature characteristics.

Handling Outliers

Involves identifying and treating extreme values that could negatively impact the model. Use techniques like trimming, capping, or transformations based on the context and the model sensitivity to outliers.

Missing Values Imputation

Deals with missing data by filling it with meaningful values such as median, mean, or mode. Use it to avoid errors during model training that can’t handle null values.

Text Vectorization

Converting text data into numerical format (vectors) so that ML algorithms can process it. Use techniques like Bag of Words, TF-IDF, or Word Embeddings depending on the task.

Data Augmentation

Involves artificially expanding the size and diversity of training datasets by creating modified versions of data points. Commonly used in image and speech recognition problems to increase robustness and performance.

Log Transformation

Applies the natural logarithm to each data point to reduce skewness. Use it on right-skewed distributions to make the data more normally distributed and stabilize variance.

Normalization

Rescales the values into a range of [0,1]. Often used when features have different scales and we want to ensure no feature dominates due to its scale.

x_{normalized} = \frac{x - x_{min}}{x_{max} - x_{min}}

Know

Still learning

Click to flip

Know