CS 2881 AI Safety

Fall 2025, Thursdays 3:45pm-6:30pm (First lecture September 4)

Course: CS 2881R - AI Safety

YouTube Lecture Playlist Course Lecture Notes and Experiments

Time and Place: Thursdays 3:45pm-6:30pm Eastern Time, SEC LL2.229 (SEC is in 150 Western Ave, Allston, MA)

Instructor: Boaz Barak

Teaching Fellows: Natalie Abreu (natalieabreu@g.harvard.edu), Roy Rinberg (royrinberg@g.harvard.edu), Hanlin Zhang (hanlinzhang@g.harvard.edu), Sunny Qin (Harvard)

Course Description: This will be a graduate level course on challenges in alignment and safety of artificial intelligence. We will consider both technical aspects as well as questions on societal and other impacts of the field.

Prerequisites: We require mathematical maturity, and proficiency with proofs, probability, and information theory, as well as the basics of machine learning, at the level of an undergraduate ML course such as Harvard CS 181 or MIT 6.036. You should be familiar with topics such as empirical and population loss, gradient descent, neural networks, linear regression, principal component analysis, etc. On the applied side, you should be comfortable with Python programming, and be able to train a basic neural network.

Important: Read the Course Introduction!

Questions? If you have any questions about the course, please email harvardcs2881@gmail.com

Related reading by Boaz:

Previous versions: Spring 2023 ML Theory Seminar Spring 2021 ML Theory Seminar

Mini Syllabus

Schedule

Classes begin September 2, 2025. Reading period December 4-9, 2025.

Thursday, September 11, 2025
Modern LLM Training
Thursday, September 18, 2025
Adversarial Robustness, Jailbreaks, Prompt Injection, Security
Thursday, October 2, 2025
Content Policies, Potentially Catastrophic Capabilities & Responsible Scaling
  • Guest Lecturer: Ziad Reslan (Product Policy, OpenAI)
  • Content policies
  • Responsible scaling policies
  • Scalable evaluations
  • Safety through capability vs. weakness
Experiment:
Evaluate open and closed source models, potentially using jailbreaking techniques
Thursday, October 9, 2025
Recursive Self-Improvement
  • Is AI R&D an "AI-complete" task?
Experiment:
To be determined: some thoughts - an experiment to determine the extent which success in a narrow task such as coding or AI requires broad general skills.
Resources:
  • TBD
Thursday, October 16, 2025
Capabilites vs. Safety
  • TBD
Experiment:
TBD
Resources:
  • TBD
Thursday, October 23, 2025
Scheming, Reward Hacking & Deception
  • Guest Lecturers: Buck Shlegeris (Redwood Research), Marius Hobbhahn (Apollo Research)
  • Exploring "bad behavior" tied to training objectives
  • Investigating potential deception in monitoring models
Experiment:
Demonstrate how impossible tasks or conflicting objectives lead to lying/scheming
Thursday, October 30, 2025
Economic Impacts of Foundation Models
Thursday, November 13, 2025
Emotional Reliance and Persuasion
  • Domestic & international regulatory approaches
  • Standards-setting & audits
Experiment:
To be determined
Resources:
  • Resources to be determined
Thursday, November 20, 2025
Military & Surveillance Applications of AI
  • Lethal autonomous weapon systems (LAWS)
  • Strategic stability & escalation risks
  • Mass-scale surveillance infrastructure
Experiment:
To be determined
No lecture on Thursday, November 27 – Thanksgiving Break
Thursday, December 4, 2025
AI 2035 - Possible Futures of AI
  • Student project presentations and discussion of future directions in AI safety research
Resources:
  • Resources to be determined

</p>