Lecture 25: Stochastic Gradient Descent

Description

Professor Suvrit Sra gives this guest lecture on stochastic gradient descent (SGD), which randomly selects a minibatch of data at each step. The SGD is still the primary method for training large-scale machine learning systems.

Summary

Full gradient descent uses all data in each step.
Stochastic method uses a minibatch of data (often 1 sample!).
Each step is much faster and the descent starts well.
Later the points bounce around / time to stop!
This method is the favorite for weights in deep learning.

Related section in textbook: VI.5

Instructor: Prof. Suvrit Sra

Course Info

keyboard_arrow_right

Instructor:	Prof. Gilbert Strang
Course Number:	18.065 18.0651
Departments:	Mathematics
Topics:	Engineering > Electrical Engineering > Signal Processing Mathematics > Applied Mathematics Mathematics > Computation Mathematics > Linear Algebra
As Taught In:	Spring 2018
Level:	Undergraduate

Topics

Course Features

AV lectures - Video

Assignments - problem sets (no solutions)

AV special element audio - Podcast

Browse Course Material

Course Info

Topics

Course Features

Description

Summary

Course Info

Topics

Course Features