Lecture 25: Stochastic Gradient Descent
Description
Professor Suvrit Sra gives this guest lecture on stochastic gradient descent (SGD), which randomly selects a minibatch of data at each step. The SGD is still the primary method for training large-scale machine learning systems.
Summary
Full gradient descent uses all data in each step.
Stochastic method uses a minibatch of data (often 1 sample!).
Each step is much faster and the descent starts well.
Later the points bounce around / time to stop!
This method is the favorite for weights in deep learning.
Related section in textbook: VI.5
Instructor: Prof. Suvrit Sra
Instructor: | |
Course Number: |
|
Departments: | |
Topics: | |
As Taught In: | Spring 2018 |
Level: | Undergraduate |
Topics
Course Features
record_voice_over
AV lectures - Video
assignment_turned_in
Assignments - problem sets (no solutions)
equalizer
AV special element audio - Podcast