Week 6 Lecture - Overview of Mini-Batch Gradient Descent
6a Overview of mini-batch gradient descent
6a-02 - Reminder: The error surface for a linear neuron
6a-03 - Convergence speed of full batch learning when the error surface is a quadratic bowl
6a-04 - How the learning goes wrong
6a-05 - Stochastic gradient descent
6a-06 - Two types of learning algorithm
6a-07 - A basic mini-batch gradient descent algorithm
6b - A bag of tricks for mini-batch gradient descent
6b-02 - Be careful about turning down the learning rate
6b-03 - Initializing the weights
6b-04 - Shifting the inputs
6b-05 - Scaling the inputs
6b-06 - A more thorough method: Decorrelate the input components
6b-07 - Common problems that occur in multilayer networks
6b-08 - Four ways to speed up mini-batch learning
6c - The momentum method
6c-02 - The intuition behind the momentum method
6c-03 - The equations of the momentum method
6c-04 - The behavior of the momentum method
6c-05 - A better type of momentum (Nesterov 1983)
6c-06 - A picture of the Nesterov method
6d - A separate, adaptive learning rate for each connection
6d-02 - The intuition behind separate adaptive learning rates
6d-03 - One way to determine the individual learning rates
6d-04 - Tricks for making adaptive learning rates work better
6e - rmsprop: Divide the gradient by a running average of its recent magnitude
6e-02 - rprop: Using only the sign of the gradient
6e-03 - Why rprop does not work with mini-batches
6e-04 - rmsprop: A mini-batch version of rprop
6e-05 - Further developments of rmsprop
6e-06 - Summary of learning methods for neural networks
Week 6 Quiz
TBD
Week 6 Vocab
TBD
Week 6 FAQ
TBD
Week 6 Other
Papers
TBD