Week 6 Lecture - Overview of Mini-Batch Gradient Descent

6a Overview of mini-batch gradient descent

6a-02 - Reminder: The error surface for a linear neuron

6a-03 - Convergence speed of full batch learning when the error surface is a quadratic bowl

6a-04 - How the learning goes wrong

6a-05 - Stochastic gradient descent

6a-06 - Two types of learning algorithm

6a-07 - A basic mini-batch gradient descent algorithm

6b - A bag of tricks for mini-batch gradient descent

6b-02 - Be careful about turning down the learning rate

6b-03 - Initializing the weights

6b-04 - Shifting the inputs

6b-05 - Scaling the inputs

6b-06 - A more thorough method: Decorrelate the input components

6b-07 - Common problems that occur in multilayer networks

6b-08 - Four ways to speed up mini-batch learning

6c - The momentum method

6c-02 - The intuition behind the momentum method

6c-03 - The equations of the momentum method

6c-04 - The behavior of the momentum method

6c-05 - A better type of momentum (Nesterov 1983)

6c-06 - A picture of the Nesterov method

6d - A separate, adaptive learning rate for each connection

6d-02 - The intuition behind separate adaptive learning rates

6d-03 - One way to determine the individual learning rates

6d-04 - Tricks for making adaptive learning rates work better

6e - rmsprop: Divide the gradient by a running average of its recent magnitude

6e-02 - rprop: Using only the sign of the gradient

6e-03 - Why rprop does not work with mini-batches

6e-04 - rmsprop: A mini-batch version of rprop

6e-05 - Further developments of rmsprop

6e-06 - Summary of learning methods for neural networks

Week 6 Quiz

TBD

Week 6 Vocab

TBD

Week 6 FAQ

TBD

Week 6 Other

Papers

TBD

Week 6 Links

Stats SE: What's the difference between momentum based gradient descent, and Nesterov's accelerated gradient descent? (2015-11-03)

Week 6 Lecture - Overview of Mini-Batch Gradient Descent

6a Overview of mini-batch gradient descent

6a-02 - Reminder: The error surface for a linear neuron

6a-03 - Convergence speed of full batch learning when the error surface is a quadratic bowl

6a-04 - How the learning goes wrong

6a-05 - Stochastic gradient descent

6a-06 - Two types of learning algorithm

6a-07 - A basic mini-batch gradient descent algorithm

6b - A bag of tricks for mini-batch gradient descent

6b-02 - Be careful about turning down the learning rate

6b-03 - Initializing the weights

6b-04 - Shifting the inputs

6b-05 - Scaling the inputs

6b-06 - A more thorough method: Decorrelate the input components

6b-07 - Common problems that occur in multilayer networks

6b-08 - Four ways to speed up mini-batch learning

6c - The momentum method

6c-02 - The intuition behind the momentum method

6c-03 - The equations of the momentum method

6c-04 - The behavior of the momentum method

6c-05 - A better type of momentum (Nesterov 1983)

6c-06 - A picture of the Nesterov method

6d - A separate, adaptive learning rate for each connection

6d-02 - The intuition behind separate adaptive learning rates

6d-03 - One way to determine the individual learning rates

6d-04 - Tricks for making adaptive learning rates work better

6e - rmsprop: Divide the gradient by a running average of its recent magnitude

6e-02 - rprop: Using only the sign of the gradient

6e-03 - Why rprop does not work with mini-batches

6e-04 - rmsprop: A mini-batch version of rprop

6e-05 - Further developments of rmsprop

6e-06 - Summary of learning methods for neural networks

Week 6 Quiz

Week 6 Vocab

Week 6 FAQ

Week 6 Other

Papers

Week 6 Links

Week 6 People

Yurii Nesterov

Ilya Sutskever

results matching ""

No results matching ""