menu

Lecture 25: Stochastic Gradient Descent

Description

Professor Suvrit Sra gives this guest lecture on stochastic gradient descent (SGD), which randomly selects a minibatch of data at each step. The SGD is still the primary method for training large-scale machine learning systems.

Summary

Full gradient descent uses all data in each step.
Stochastic method uses a minibatch of data (often 1 sample!).
Each step is much faster and the descent starts well.
Later the points bounce around / time to stop!
This method is the favorite for weights in deep learning.

Related section in textbook: VI.5

Instructor: Prof. Suvrit Sra

Course Features

record_voice_over AV lectures - Video
assignment_turned_in Assignments - problem sets (no solutions)
equalizer AV special element audio - Podcast