Spring 2023, Thursdays 3:45pm-6:30pm SEC 1.402 Classroom (First lecture Jan 26)

Instructor: Boaz Barak

Teaching Fellows: Gustaf Ahdritz, Gal Kaplun

Links (enrolled students only): Canvas

See also Spring 2021 version (the field is moving rapidly, and so the courses would not be the same, but it gives some sense; also, the spring 2021 was held over Zoom - Spring 2023 course would be much more “hand on” and so you could expect going into greater depth but also more work.)

TL;DR: The goal of this course is to prepare students for research in the foundations of deep learning. By the end of the course you should be able to read most cutting-edge papers in this field, as well as be capable of reproducing at least some experimental results (those that do not require an inordinate amount of computational and human resources). Ideally, you should be on the way of working on original research on the field. To achieve this the course will require a large amount of independence from students, including both self-study and peer study.

Schedule

Lecture 1: Thursday, January 26, 2023

slides (powerpoint) slides (pdf)

Introduction to the course, a quick review of classical ML: representation (i.e., approximation theorems), optimization (convexity, stochastic gradient descent), generalization (bias/variance tradeoff). Differences between that and modern paradigms.

Transformer architecture. How it works, why it is well-suited for GPUs, auto-regressive language models. The next-token prediction task. Some questions: are transformers useful for their inductive bias, or for their highly efficient GPU implementation? Differences between fine tuning, prompt tuning, linear readouts.

Options (not sure how much we will cover): Vision transformers, MLP mixer, attention in linear time

Reading:

Model: original paper and annotated version (colab version)

Lecture 2: Thursday, February 2, 2023

slides (powerpoint) slides (pdf) Handwritten notes for board (pdf)

Generative models: Variational principle, VAEs, normalizing flows.

Reading: Chapter 2 (VAE) Kingma and Welling survey on VAEs. Chapter 3 (exponential distributions, can skim concrete examples in 3.3) Wainwright and Jordan. Lilan Weng blog on normalizing flows. Survey by Kobyzev, Prince, and Brubaker (see also CVPR 21 tutorial)

Lecture 3: Thursday, February 9, 2023

slides (powerpoint) slides (pdf)

Diffusion models

Reading: On Perusall - Weng blog, Karras et al unifying design space , MacAllester math of diffusion.

Additional resources: Latent diffusion (Rombach et al), classifier-free guidance (Ho and Salimans) Blog posts of Song and Das. Vadhat tutorial (video, 2 hours).

Lecture 4: Thursday, February 16, 2023

slides (powerpoint) slides (pdf) Handwritten notes for board (pdf)

Privacy in machine learning

2014 manuscript on Differential Privacy by Dwork and Roth . For issues of computational complexity, see the survey of Vadhan.

DP-SGD paper see lecture notes by Smith and Ullman, notes by Kamath, and slides by Bellet. This video of Kamath can also be useful.

https://differentialprivacy.org/

Attacks on non-private models: Membership inference. Extracting training data from GPT2 and Diffusion models

Failure of heuristics, e.g. Attack on InstaHide.

Exposed! A survey of attacks on private data.

Issues with DP for deep learning: Tramer-Boneh: DP needs better featuresBagdasaryan-Shmatikov: DP impacts subgroups differently.

Machine unlearning: see this

Relaxations of DP: label DP, privacy-preserving predictions. DP fine tuning of large models (see also this).

Separate issue: Protecting model weights from inference server via homomorphic encryption or other cryptographic tools, see cryptonets (2016), this recent paper and references within.

Lecture 5: Thursday, February 23, 2023

slides (powerpoint) slides (pdf)

Protein Folding: AlphaFold - guest lecture by Gustaf Ahdritz.

Reading: AlphaFold1 paper, AlphaFold2 paper. Blog: Mohammed AlQuraishi blog1, blog2

Lecture 6: Thursday, March 2, 2023

slides (powerpoint) slides (pdf) Handwritten notes for board (pdf)

Training Dynamics: Differences between back-propagation and perturbative methods, natural gradient, edge of stability, deep bootstrap, the effect of issues such as batch norm, residual connections, SGD vs Adam.

Reading: lecture notes of Roger Grosse, Deep Bootstrap paper, Edge of stability paper, SGD complexity paper. Francis Bach’s blog on depth-2 networks dynamics (guest post by Lénaïc Chizat). Chinchilla paper on scaling laws.

Lecture 7: Thursday, March 9, 2023

slides (powerpoint) slides (pdf)

Training dynamics continued.

We will look at Deep Boostrap, Edge of Stability, and scaling laws (particularly Chinchilla and to what extent they are challenged by LlaMA). Some other reading: mathematical models that demonstrate the above phenomena: deep bootstrap in kernels, understanding edge-of-stability via minimialist example, edge-of-stability in 2-layer nets, explaining neural scaling laws, power laws in Kernels (see also this , this, and nearest-neighbor rates).

(No lecture on Thursday, March 16, 2023)

Lecture 8: Thursday, March 23, 2023

slides (pdf)

Reinforcement learning - guest lecture by Sham Kakade

Readings:

Lecture 9: Thursday, March 30, 2023

slides (powerpoint) slides (pdf)

Test-time computation- test-time augmentation, beam search, retrieval-based models, differntiable vs non-differentiable memory and tools.

Reading:

Survey on augmented language models.
Best of n outputs WebGPT paper, plurality voting Wang et al, Minerva paper
In-context learning, and is it really “learning” or “conext conditioning”: Min et al - in-context examples more useful for the data distributions than labels, Wei et al - LLMs can adapt to label dist also
Chain of thought: Wei et al, zero shot CoT Kojima et al (“step by step”)
Differentiable memory: RETRO (Deepmind) , Memorizing transformers, Ruccrent memory (Bulatov et al)
Non-differentiable memory, Natural language as universal API Toolformer (Schick et al), see also “Bing inner monologue” (e.g. here, here, unsure the extent these are confirmed), langchain, Taskmatrix.ai

Lecture 10: Thursday, April 6, 2023

slides (powerpoint) slides (pdf)

Boaz’s post-lecture blog post on safety

AI Safety, Fairness, Accountability, Transparency, Alignment.

Fair ML textbook. Hendrycks safety course.

Algorithmic Auditing Veccione at al. Against predictive optimization Wang et al. Meta study on bias papers in NLP. Feature highlighting explanations in model interpretability (Barocas et al). The mythos of model interprtability - Lipton. Gender Shades - Boulamwini and Gebru.

Impact of Russian disinformation campaign - Eady et al

Natural selection favors AIs over humans - Hendrycks. (see also Carlsmith)

Unsolved problems in AI safety, Hendrycks et al (see also X risk analysis Hendrycks and Mazeika) Reward misspecification - Pan et al . Christiano blog post. Alignment problem from DL perspective (Ngo et al)

Verification/ Critique: Readteaming LMs with LMs (Deepmind) , Self-critiquing models (Openai)

Beyond normal accident theory Marais et al

AI will change world but not take over via 3d chess / Barak and Edelman

We might not talk a lot about adversarial robustness but some sources include

RobustBench and the links there

Uncertainty under distribution shift - Ovadia et al

Lecture 11: Thursday, April 13, 2023

Guest lecture on efficient training of deep nets, by Horace He from the Pytorch team.

Horace’s slides

Reading:

The Bitter Lesson / Sutton

Is Moore’s law ending or not? / Herz

Stephen Jones video - GPU programming - especially 15m30 to 22m20

Horace He: Making Deep Learning Go Brrrr From First Principles

Overview of Parallelism Strategies / Lilian Weng

Matt Pharr: the story of ispc

Lecture 12: Thursday, April 20, 2023

slides (powerpoint) slides (pdf)

Course summary, looking back into the early days of computers in general and AI in particular, as well as trying to make predictions about the future.

Reading Some historical notes about the development of AI:

John von Neumann The Computer and the Brain
A New Yorker profile on Marvin Minsky from 1981. This is not just for reading about Minsky’s achievements, but also to get a sense of the people involved, and how AI research was perceived in the early 1980s. (Even if the author is too reverential towards Minsk
Original 1943 paper of McCullough and Pitts
Alan Turing’s 1950 Computing Machinery and Intelligence where he presented his famous “Turing test”.
Rosenblatt’s 1961 book on preceptrons
Rumelhart-Hinton-Williams backpropagation paper
The proposal for the 1956 Dartmouth workshop see also Wikipedia article
Sir James Lighthill’s 1972 report on state of AI this depressing report summarizes the perception of an “AI winter” and apparently also caused the UK AI winter.
A sociological history of the neural network controversy - Olazaran, 1993.
Talking Nets: Oral history - Anderson and Rosenfeld 1998. A sequence of interviews taken in the 1990s with Michael Arbib, Gail Carpenter, Leon Cooper, Jack Cowan, Walter Freeman, Stephen Grossberg, Robert Hecht-Neilsen, Geoffrey Hinton, Teuvo Kohonen, Bart Kosko, Jerome Lettvin, Carver Mead, David Rumelhart, Terry Sejnowski, Paul Werbos, and Bernard Widrow.
Early-ish discussions on “singularity” I.J. Good 1966 , Vinge 1993

Future predictions: