Spring 2023, Thursdays 3:45pm-6:30pm SEC 1.402 Classroom (First lecture Jan 26)
Instructor: Boaz Barak
|Links (enrolled students only): Canvas||Perusall||Gradescope|
See also Spring 2021 version (the field is moving rapidly, and so the courses would not be the same, but it gives some sense; also, the spring 2021 was held over Zoom - Spring 2023 course would be much more “hand on” and so you could expect going into greater depth but also more work.)
TL;DR: The goal of this course is to prepare students for research in the foundations of deep learning. By the end of the course you should be able to read most cutting-edge papers in this field, as well as be capable of reproducing at least some experimental results (those that do not require an inordinate amount of computational and human resources). Ideally, you should be on the way of working on original research on the field. To achieve this the course will require a large amount of independence from students, including both self-study and peer study.
See also these two blog posts of Boaz:
Formal description: A graduate level course on recent advances and open questions in the foundations of machine learning and specifically deep learning. We will review both classical results as well as recent papers in areas including classifiers and generalization gaps, representation learning, generative models, adversarial robustness, out of distribution performance, and more.
This is a fast-moving area and it will be a fast-moving course. We will aim to cover both state-of-art results, as well as the intellectual foundations for them, and have a substantive discussion on both the “big picture” and technical details of the papers. In addition to the theoretical lectures, the course will involve a programming component aiming to get students to the point where they can both reproduce results from papers and work on their own research. This component will be largely self-directed and we expect students to be proficient in Python and in picking up technologies and libraries such as pytorch/numpy/etc on their own (aka “Stack Overflow oriented programming”).
Prerequisites: We require mathematical maturity, and proficiency with proofs, probability, and information theory, as well as the basics of machine learning, at the level of an undergraduate ML course such as Harvard CS 181 or MIT 6.036. You should be familiar with topics such as empirical and population loss, gradient descent, neural networks, linear regression, principal component analysis, etc. On the applied side, you should be comfortable with Python programming, and be able to train a basic neural network. (Or achieve this via self study before the beginning of the course; see homework zero).
Apply to this course: The course will be capped and students will need to apply. Before applying, please make sure to complete homework zero which you should submit as part of the application. Applications are due by January 17, 2023 11:59pm. Note: If you have any questions about homework zero then feel free to email Boaz+Gustaf+Gal.
Introduction to the course, a quick review of classical ML: representation (i.e., approximation theorems), optimization (convexity, stochastic gradient descent), generalization (bias/variance tradeoff). Differences between that and modern paradigms.
Transformer architecture. How it works, why it is well-suited for GPUs, auto-regressive language models. The next-token prediction task. Some questions: are transformers useful for their inductive bias, or for their highly efficient GPU implementation? Differences between fine tuning, prompt tuning, linear readouts.
Options (not sure how much we will cover): Vision transformers, MLP mixer, attention in linear time
Other transformer tutorials:
Inductive bias: learning convolutions from scratch (Benham)
Pretraining without attention SSM
Generative models: Variational principle, VAEs, normalizing flows.
Reading: Chapter 2 (VAE) Kingma and Welling survey on VAEs. Chapter 3 (exponential distributions, can skim concrete examples in 3.3) Wainwright and Jordan. Lilan Weng blog on normalizing flows. Survey by Kobyzev, Prince, and Brubaker (see also CVPR 21 tutorial)
Privacy in machine learning
2014 manuscript on Differential Privacy by Dwork and Roth . For issues of computational complexity, see the survey of Vadhan.
Machine unlearning: see this
Protein Folding: AlphaFold - guest lecture by Gustaf Ahdritz.
Training Dynamics: Differences between back-propagation and perturbative methods, natural gradient, edge of stability, deep bootstrap, the effect of issues such as batch norm, residual connections, SGD vs Adam.
Reading: lecture notes of Roger Grosse, Deep Bootstrap paper, Edge of stability paper, SGD complexity paper. Francis Bach’s blog on depth-2 networks dynamics (guest post by Lénaïc Chizat). Chinchilla paper on scaling laws.
Training dynamics continued.
We will look at Deep Boostrap, Edge of Stability, and scaling laws (particularly Chinchilla and to what extent they are challenged by LlaMA). Some other reading: mathematical models that demonstrate the above phenomena: deep bootstrap in kernels, understanding edge-of-stability via minimialist example, edge-of-stability in 2-layer nets, explaining neural scaling laws, power laws in Kernels (see also this , this, and nearest-neighbor rates).
(No lecture on Thursday, March 16, 2023)
Reinforcement learning - guest lecture by Sham Kakade
Test-time computation- test-time augmentation, beam search, retrieval-based models, differntiable vs non-differentiable memory and tools.
Survey on augmented language models.
In-context learning, and is it really “learning” or “conext conditioning”: Min et al - in-context examples more useful for the data distributions than labels, Wei et al - LLMs can adapt to label dist also
Non-differentiable memory, Natural language as universal API Toolformer (Schick et al), see also “Bing inner monologue” (e.g. here, here, unsure the extent these are confirmed), langchain, Taskmatrix.ai
AI Safety, Fairness, Accountability, Transparency, Alignment.
Algorithmic Auditing Veccione at al. Against predictive optimization Wang et al. Meta study on bias papers in NLP. Feature highlighting explanations in model interpretability (Barocas et al). The mythos of model interprtability - Lipton. Gender Shades - Boulamwini and Gebru.
Unsolved problems in AI safety, Hendrycks et al (see also X risk analysis Hendrycks and Mazeika) Reward misspecification - Pan et al . Christiano blog post. Alignment problem from DL perspective (Ngo et al)
Beyond normal accident theory Marais et al
AI will change world but not take over via 3d chess / Barak and Edelman
We might not talk a lot about adversarial robustness but some sources include
RobustBench and the links there
Guest lecture on efficient training of deep nets, by Horace He from the Pytorch team.
Course summary, looking back into the early days of computers in general and AI in particular, as well as trying to make predictions about the future.
Reading Some historical notes about the development of AI:
John von Neumann The Computer and the Brain
A New Yorker profile on Marvin Minsky from 1981. This is not just for reading about Minsky’s achievements, but also to get a sense of the people involved, and how AI research was perceived in the early 1980s. (Even if the author is too reverential towards Minsk
Original 1943 paper of McCullough and Pitts
Alan Turing’s 1950 Computing Machinery and Intelligence where he presented his famous “Turing test”.
Rosenblatt’s 1961 book on preceptrons
A sociological history of the neural network controversy - Olazaran, 1993.
Talking Nets: Oral history - Anderson and Rosenfeld 1998. A sequence of interviews taken in the 1990s with Michael Arbib, Gail Carpenter, Leon Cooper, Jack Cowan, Walter Freeman, Stephen Grossberg, Robert Hecht-Neilsen, Geoffrey Hinton, Teuvo Kohonen, Bart Kosko, Jerome Lettvin, Carver Mead, David Rumelhart, Terry Sejnowski, Paul Werbos, and Bernard Widrow.