Explore tens of thousands of sets crafted by our community.
Machine Learning Basics
15
Flashcards
0/15
Unsupervised Learning
Unsupervised Learning involves training a model on data without labels. The model tries to infer patterns within the data. Clustering is a common example of unsupervised learning.
Neural Networks
Neural Networks are computing systems inspired by the biological neural networks of animal brains. They are used in deep learning and consist of interconnected units called neurons. An example is Convolutional Neural Networks used in image recognition.
Principal Component Analysis (PCA)
Principal Component Analysis is a statistical technique that uses orthogonal transformations to convert a set of correlated variables into a set of uncorrelated variables called principal components. This is useful for dimensionality reduction. An example is reducing the dimensions of a dataset with many variables before applying a machine learning algorithm.
Supervised Learning
Supervised Learning is a type of machine learning where the model is trained on labeled data. For example, in image classification, the training set consists of images tagged with their respective categories.
Decision Trees
Decision Trees are a type of predictive modeling algorithm that maps observations about data with conclusions about the target value. They are used for both classification and regression tasks. An example is using a decision tree to predict whether a loan applicant is a high or low credit risk.
Cross-Validation
Cross-Validation is a technique for assessing how a predictive model will perform on an independent data set. It involves dividing the dataset into a number of parts or 'folds', training the model on all but one, and then testing on the remaining slice. An example is k-fold cross-validation.
Regularization
Regularization refers to techniques that prevent overfitting by imposing penalties on model parameters that can potentially be too large, therefore encouraging simpler models. Examples include L1 (Lasso) and L2 (Ridge) regularization.
Random Forests
Random Forests consist of many individual decision trees that operate as an ensemble. Each tree makes a prediction, and the mode or mean prediction is taken as the output depending on the task. An example is using a random forest to improve the accuracy of a classification task over a single decision tree.
Support Vector Machines (SVM)
Support Vector Machines are supervised learning models used for classification and regression tasks. They try to find the hyperplane that best divides a dataset into classes. An example is using SVM to classify hand-written letters based on pixel data.
Bias-Variance Tradeoff
The Bias-Variance Tradeoff is a fundamental issue that must be tackled when training machine learning models. High bias can lead to underfitting, and high variance can lead to overfitting. An example is adjusting model complexity to balance the tradeoff.
Underfitting
Underfitting happens when a model is too simple to capture the underlying structure of the data, leading to poor performance on both training and unseen data. A linear model trying to fit non-linear data is an example of underfitting.
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the gradient. It is used extensively in training neural networks. is an example of a cost function being minimized.
Overfitting
Overfitting occurs when a model learns the training data too well, including noise and outliers, which harms its performance on new, unseen data. An example of overfitting can occur in a decision tree that grows too deep and captures noise in the data.
Reinforcement Learning
Reinforcement Learning is a type of machine learning where an agent learns to make decisions by performing actions and receiving rewards in a given environment. An example is a chess-playing algorithm that improves by playing more games.
K-Means Clustering
K-Means Clustering is an unsupervised learning algorithm that divides a dataset into K distinct, non-overlapping subsets. It assigns data points to the nearest cluster center. An example is segmenting customers into different groups based on purchasing behavior.
© Hypatia.Tech. 2024 All rights reserved.