Lecture 23: Accelerating Gradient Descent (Use Momentum)
Description
In this lecture, Professor Strang explains both momentum-based gradient descent and Nesterov's accelerated gradient descent.
Summary
Study the zig-zag example: Minimize \(F = \frac{1}{2} (x^2 + by^2)\)
Add a momentum term / heavy ball remembers its directions.
New point \(k\) + 1 comes from TWO old points \(k\) and \(k\) - 1.
"1st order" becomes "2nd order" or "1st order system" as in ODEs.
Convergence rate improves: 1 - \(b\) to 1 - square root of \(b\) !
Related section in textbook: VI.4
Instructor: Prof. Gilbert Strang
Instructor: | |
Course Number: |
|
Departments: | |
Topics: | |
As Taught In: | Spring 2018 |
Level: | Undergraduate |
Topics
Course Features
record_voice_over
AV lectures - Video
assignment_turned_in
Assignments - problem sets (no solutions)
equalizer
AV special element audio - Podcast