Program in Statistics and Machine Learning


  • Director

    • Ryan P. Adams
  • Executive Committee

    • Ryan Adams
    • Prateek Mittal
    • John Mulvey
    • Peter Ramadge
    • Mark Ratkovic
    • Mengdi Wang
  • Associated Faculty

    • Emmanuel Abbe
    • Amir Ali Ahmadi
    • Yacine Ait-Sahalia
    • Sanjeev Arora
    • Matias Cattaneo
    • Danqi Chen
    • Yuxin Chen
    • Jonathan D. Cohen
    • Jia Deng
    • Abigail Doyle
    • Barbara Engelhardt
    • Kirill Evdokimov
    • Thomas Griffiths
    • Elad Hazan
    • Bo E. Honore
    • Chi Jin
    • Michal Kolesar
    • SY Kung
    • Jason Lee
    • Naomi Leonard
    • Mariangela Lisanti
    • John Londregan
    • Anirudha Majumdar
    • Meredith Martin
    • William Massey
    • Peter Melchoir
    • Ulrich K. Mueller
    • Karthik Narasimhan
    • Arvind Narayanan
    • Kenneth A. Norman
    • Jonathan W. Pillow
    • Mikkel Plagborg-Moller
    • H. Vincent Poor
    • Warren Powell
    • Ben Raphael
    • Olga Russakovsky
    • Matthew J. Salganik
    • H. Sebastian Seung
    • Christopher A. Sims
    • Amit Singer
    • Yoram Singer
    • Mona Singh
    • Brandon M. Stewart
    • John D. Storey
    • Michael A. Strauss
    • Rocio Titiunik
    • Jeroen Tromp
    • Olga G. Troyanskaya
    • Ramon van Handel
    • Robert J. Vanderbei
    • Mark W. Watson
    • Yu Xie
  • Sits with Committee

    • Michael Guerzhoy
    • Daisy Yan Huang


Program Information

Information and Departmental Plan of Study

The Program in Statistics and Machine Learning is offered by the Center for Statistics and Machine Learning. The program is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering, and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from a variety of disciplines to address numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cell-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policy making, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. The program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and apply statistics and machine learning methods to many problems of interest.

Admission to the Program

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses.

For enrollment, please use this form: Certificate Enrollment Application(link is external)

For questions, contact us at

Program of Study

Students are required to take a total of five courses and earn at least a B-, complete the independent work requirement, and attend the annual poster session. Students may count at most two courses from their departmental concentration toward the certificate.  With permission, advanced students may be permitted to take approved graduate-level courses.

Course Work:

  • One statistics course from the following list. Student must receive at least a B- (pdf is not permitted.  Credit or exemptions for AP exams is not permitted).

ECO 202 Statistics & Data Analysis for Economics

ORF 245 Fundamentals of Statistics

POL 345/SOC 305 Intro to Quantitative Social Science

PSY 251 Quantitative Methods

WWS 200 Statistics for Social Science

  • One machine learning course from the following list. Student must receive at least a B- (pdf not permitted).

COS 324 Introduction to Machine Learning

COS 424/SML 302 Fundamentals of Machine Learning

ELE 364 Machine Learning for Predictive Data Analytics

ELE 435 Machine Learning and Pattern Recognition

MAT 490 Mathematical Introduction to Machine Learning

ORF 350 Analysis of Big Data

  • Three electives (please consult the program website for a complete list of approved courses).  Student must receive at least a B- (pdf not permitted).

Independent Work and SML Poster Session:

Students are required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics.  This work may be used to satisfy the requirements of both the SML certificate program and the student's department of concentration. All work will be reviewed by the Statistics and Machine Learning Certificate committee. In May, there will be a public poster session at which students are required to present their work to each other, to other students, and to the faculty. Students must adhere to submission due dates for independent work papers and poster requirements. Attendance for the poster session is mandatory.

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus, including the CSML sponsored or co-sponsored seminars.

For a list of required courses that will count towards the certificate, please visit our website(link is external).  

Certificate of Proficiency

Students who fulfill all the program requirements receive a certificate upon graduation.



SML 101 Reasoning with Data QR

Data-driven decision-making, research discovery, and technology development are everywhere. It is now more important than ever for individuals to understand how data are used for these purposes. This course will introduce the student to how statistical reasoning and methods are used to learn from and leverage modern data. The emphasis will be on concepts and strategies for learning from data, rather than on sophisticated mathematics. Students will be exposed to the basics of statistics, machine learning, and data science through real world problems and applications. Students will also analyze data sets using the computer. Instructed by: Staff

SML 201 Introduction to Data Science QR

This course provides an introduction to the burgeoning field of data science, which is primarily concerned with data-driven discovery and utilizing data as a research and technology development tool. We cover approaches and techniques for obtaining, organizing, exploring, and analyzing data, as well as creating tools based on data. Elements of statistics, machine learning, and statistical computing form the basis of the course content. We consider applications in the natural sciences, social sciences, and engineering. Instructed by: Staff

SML 302 Fundamentals of Machine Learning (See COS 424)

SML 310 Research Projects in Data Science Fall/Spring

Project-based course in which students work individually or in small teams to tackle data science and ML problems based on real datasets. We will emphasize critical thinking about experiments and large dataset analysis along with the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work; students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements. Instructed by: Staff