Explore tens of thousands of sets crafted by our community.
Gradient Descent Variants
8
Flashcards
0/8
Nesterov Accelerated Gradient (NAG)
NAG is a variant where the gradient is calculated not at the current parameters but at the look-ahead parameters based on the current momentum. It anticipates the future gradient and can lead to improved performance.
Adam
Adam combines ideas from both Momentum and RMSprop. It calculates an exponentially moving average of the gradients and the squared gradients. The key difference is the use of bias-corrected first and second moment estimates to update the parameters.
RMSprop
RMSprop modifies Adagrad to prevent the learning rate from diminishing too quickly. It uses a moving average of squared gradients to normalize the gradient update, which helps in non-stationary objectives and complex problems.
Batch Gradient Descent
Batch Gradient Descent considers all samples in the dataset for a single update of parameters per iteration. The key difference is its computation on the entire dataset, which makes it very slow for large datasets but results in stable convergence.
Stochastic Gradient Descent (SGD)
SGD updates parameters for each training example one by one. The key difference is increased noise which can help escape local minima, but may lead to fluctuating convergence. It's faster per iteration compared to Batch Gradient Descent.
Momentum
Momentum takes into consideration past gradients to smooth out the updates. It helps accelerate SGD in the relevant direction and dampens oscillations.
Adagrad
Adagrad adapts the learning rate to the parameters, performing larger updates for infrequent parameters and smaller updates for frequent ones. The key difference is its unique per-parameter learning rate, which can improve performance on sparse data.
Mini-batch Gradient Descent
Mini-batch Gradient Descent is a compromise between Batch and Stochastic, it updates parameters based on a subset of the data. The key difference is better handling of computational resources and a balance between convergence stability and speed.
© Hypatia.Tech. 2024 All rights reserved.