Program in Statistics and Machine Learning

Faculty

Director

  • Ryan P. Adams

Executive Committee

  • Christine Allen-Blanchette, Mechanical & Aerospace Eng
  • Peter M. Melchior, Astrophysical Sciences
  • Prateek Mittal, Electrical & Comp Engineering
  • Peter J. Ramadge, Electrical & Comp Engineering
  • Marc Ratkovic, Politics
  • Brandon M. Stewart, Sociology
  • Mengdi Wang, Electrical & Comp Engineering

Associated Faculty

  • Sigrid M. Adriaenssens, Civil and Environmental Eng
  • Amir Ali Ahmadi, Oper Res and Financial Eng
  • Sanjeev Arora, Computer Science
  • Yacine Aït-Sahalia, Economics
  • Matias D. Cattaneo, Oper Res and Financial Eng
  • Danqi Chen, Computer Science
  • Jonathan D. Cohen, Psychology
  • Jia Deng, Computer Science
  • Jianqing Fan, Oper Res and Financial Eng
  • Jaime Fernandez Fisac, Electrical & Comp Engineering
  • Filiz Garip, Sociology
  • Tom Griffiths, Psychology
  • Boris Hanin, Oper Res and Financial Eng
  • Elad Hazan, Computer Science
  • Bo E. Honoré, Economics
  • Niraj K. Jha, Electrical & Comp Engineering
  • Chi Jin, Electrical & Comp Engineering
  • Jason Matthew Klusowski, Oper Res and Financial Eng
  • Michal Kolesár, Economics
  • Ching-Yao Lai, Geosciences
  • Jason D. Lee, Electrical & Comp Engineering
  • Naomi E. Leonard, Mechanical & Aerospace Eng
  • Sarah-Jane Leslie, Philosophy
  • John B. Londregan, Schl of Public & Int'l Affairs
  • Anirudha Majumdar, Mechanical & Aerospace Eng
  • William A. Massey, Oper Res and Financial Eng
  • Reed M. Maxwell, Civil and Environmental Eng
  • Peter M. Melchior, Astrophysical Sciences
  • Ulrich K. Mueller, Economics
  • Karthik Narasimhan, Computer Science
  • Jonathan W. Pillow, Psychology
  • H. Vincent Poor, Electrical & Comp Engineering
  • Yuri Pritykin, Computer Science
  • Miklos Z. Racz, Oper Res and Financial Eng
  • Olga Russakovsky, Computer Science
  • Matthew J. Salganik, Sociology
  • Amit Singer, Mathematics
  • Mona Singh, Computer Science
  • Bartolomeo Stellato, Oper Res and Financial Eng
  • Brandon M. Stewart, Sociology
  • John D. Storey, Integrative Genomics
  • Michael A. Strauss, Astrophysical Sciences
  • Rocío Titiunik, Politics
  • Jeroen Tromp, Geosciences
  • Olga G. Troyanskaya, Computer Science
  • Robert J. Vanderbei, Oper Res and Financial Eng
  • Mark W. Watson, Schl of Public & Int'l Affairs
  • Michael A. Webb, Chemical and Biological Eng

Sits with Committee

  • Daisy Yan Huang
  • Ricardo Pereira Masini
For a full list of faculty members and fellows please visit the department or program website.

Program Information

Information and Departmental Plan of Study

The Undergraduate Certificate Program in Statistics and Machine Learning is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering, and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from a variety of disciplines to address numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cellular-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policymaking, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. The program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and also leave them equipped to apply statistics and machine learning methods to many problems of interest.

Admission to the Program

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses.

For enrollment, please use this form: Certificate Enrollment Application(link is external)

For questions, contact us at smlcert@princeton.edu.

Program of Study

Students are required to take a total of five courses and earn at least a B-, complete the certificate’s independent work requirement, and attend CSML's annual poster session. 

Course Work:

  • One statistics course from the approved list. Student must receive at least a B- (Pass/D/Fail is not permitted. Credit or exemptions for AP exams is not permitted).
  • One machine learning course from the approved list. Student must receive at least a B- (Pass/D/Fail is not permitted).
  • Three electives from the approved list.  Student must receive at least a B- (Pass/D/Fail is not permitted).

Students may count at most two courses from their departmental concentration toward the certificate. With permission, advanced students may be permitted to take approved graduate-level courses.

Independent Work 

Students are required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics. This work may be used to satisfy the requirements of both the SML certificate program and the student's department of concentration. All work will be reviewed by the Statistics and Machine Learning Certificate committee. In May, there will be a poster session at which students are required to present their work to other students, researchers, and to the faculty. Students must adhere to submission due dates for independent work papers and poster requirements. 

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus, including the CSML sponsored or co-sponsored seminars.

For a list of required courses that will count toward the certificate, please visit our website (link is external).  

Certificate of Proficiency

Students who fulfill all the program requirements will receive a certificate upon graduation.

Courses

SML 201 Introduction to Data Science Fall/Spring QCR

Introduction to Data Science provides a practical introduction to the burgeoning field of data science. The course introduces students to the essential tools for conducting data-driven research, including the fundamentals of programming techniques and the essentials of statistics. Students will work with real-world datasets from various domains; write computer code to manipulate, explore, and analyze data; use basic techniques from statistics and machine learning to analyze data; learn to draw conclusions using sound statistical reasoning; and produce scientific reports. No prior knowledge of programming or statistics is required. Instructed by: D. Huang

SML 302 Fundamentals of Machine Learning (See COS 424)

SML 305 Mathematics for Numerical Computing and Machine Learning (See COS 302)

SML 306 Machine Learning with Social Data: Opportunities and Challenges (See SOC 306)

SML 310 Research Projects in Data Science (A) Fall/Spring

Project-based course in which students work individually or in small teams to tackle data science and ML problems based on real datasets. We will emphasize critical thinking about experiments and large dataset analysis along with the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work; students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements. Instructed by: R. Pereira Masini

SML 312 Research Projects in Data Science (B) Fall/Spring

Project-based course in which students work individually/small teams to tackle DS and ML problems, working with real-world datasets.The course emphasizes critical thinking about experiments and dataset analysis and the ability to clearly communicate one's research. Programming components are taught in Python. Experience in only one of the two programming languages (R and Python) is required.This course is intended to support students in developing the analytical skills for quantitative independent work; students should consult with their home department about how this course could complement, not replace, their independent work requirements. Instructed by: J. Hanke