CS229br Foundations of Deep Learning (aka Topics in the Foundations of Machine Learning)

MIT 18.408: Topics in theoretical computer science

Ankur Moitra

Wednesdays 12-3

See home page for Harvard CS 229r and MIT 18.408.

MIT 18.408 home page

Course description: Deep learning has sparked a revolution across machine learning. It has led to major advancements in vision, speech, playing strategic games, and the sciences. And yet it remains largely a mystery. We do not understanding why the algorithms that we use work so well in practice.

In this class we will explore theoretical foundations for deep learning, emphasizing the following themes: (1) Approximation: What sorts of functions can be represented by deep networks, and does depth provably increase the expressive power? (2) Optimization: Essentially all optimization problems we want to solve in practice are non-convex. What frameworks can be used to analyze such problems? (3) Beyond-Worst Case Analysis: Deep networks can memorize worst-case data, so why do they generalize well on real-world data? For this and related questions, our starting point will often be natural data-generative models. The theory of deep learning is still very much a work-in-progress. Our goal in this course is merely to explain some of the key questions that drive the this area, and take a critical look at where the existing theory falls short.

We will cover topics such as: Barron’s theorem, depth separations, landscape analysis, implicit regularization, neural tangent kernels, generalization bounds, data poisoning attacks and frameworks for proving lower bounds against deep learning.

Harvard “Sister seminar”: This MIT seminar will be coordinated with a “sister seminar” at Harvard, taught by Boaz Barak. We recommend that students taking MIT 18.408 also take the Harvard course, but this is not required. The two courses will share some but not all lectures and assignments. So, if you take MIT 18.408, please keep the Monday 12-3 slot free as well.

Prerequisites (for both CS 229br and MIT 18.408): Both courses will require mathematical maturity, and proficiency with proofs, probability, and information theory, as well as the basics of machine learning. We expect that students will have both theory background (at Harvard: CS 121 and CS 124 or similar, at MIT: 6.046 or similar) as well as machine learning background (at Harvard: CS 181 or 183 or similar, at MIT: 6.036 or similar).

Apply for one or both courses: Both courses are open to Harvard and MIT graduate and undergraduate students. Both courses will have a limited number of slots. You can apply to both the Harvard and MIT courses by filling out this form. You can apply to one or both of the courses.