Program in Statistics and Machine Learning



  • Ryan P. Adams
  • Peter M. Melchior (acting)

Executive Committee

  • Prateek Mittal, Electrical & Comp Engineering
  • Peter J. Ramadge, Electrical & Comp Engineering
  • Marc Ratkovic, Politics
  • Mengdi Wang, Electrical & Comp Engineering

Associated Faculty

  • Amir Ali Ahmadi, Oper Res and Financial Eng
  • Sanjeev Arora, Computer Science
  • Yacine Aït-Sahalia, Economics
  • Matias D. Cattaneo, Oper Res and Financial Eng
  • Danqi Chen, Computer Science
  • Jonathan D. Cohen, Psychology
  • Jia Deng, Computer Science
  • Jianqing Fan, Oper Res and Financial Eng
  • Jaime Fernandez Fisac, Electrical & Comp Engineering
  • Filiz Garip, Sociology
  • Tom Griffiths, Psychology
  • Boris Hanin, Oper Res and Financial Eng
  • Elad Hazan, Computer Science
  • Bo E. Honoré, Economics
  • Niraj K. Jha, Electrical & Comp Engineering
  • Chi Jin, Electrical & Comp Engineering
  • Jason Matthew Klusowski, Oper Res and Financial Eng
  • Michal Kolesár, Economics
  • Ching-Yao Lai, Geosciences
  • Jason D. Lee, Electrical & Comp Engineering
  • Naomi E. Leonard, Mechanical & Aerospace Eng
  • John B. Londregan, Schl of Public & Int'l Affairs
  • Anirudha Majumdar, Mechanical & Aerospace Eng
  • William A. Massey, Oper Res and Financial Eng
  • Reed M. Maxwell, Civil and Environmental Eng
  • Peter M. Melchior, Astrophysical Sciences
  • Ulrich K. Mueller, Economics
  • Karthik Narasimhan, Computer Science
  • Jonathan W. Pillow, Psychology
  • H. Vincent Poor, Electrical & Comp Engineering
  • Yuri Pritykin, Computer Science
  • Miklos Z. Racz, Oper Res and Financial Eng
  • Olga Russakovsky, Computer Science
  • Matthew J. Salganik, Sociology
  • Amit Singer, Mathematics
  • Mona Singh, Computer Science
  • Bartolomeo Stellato, Oper Res and Financial Eng
  • Brandon M. Stewart, Sociology
  • John D. Storey, Integrative Genomics
  • Michael A. Strauss, Astrophysical Sciences
  • Rocío Titiunik, Politics
  • Jeroen Tromp, Geosciences
  • Olga G. Troyanskaya, Computer Science
  • Robert J. Vanderbei, Oper Res and Financial Eng
  • Mark W. Watson, Schl of Public & Int'l Affairs

Sits with Committee

  • Daisy Yan Huang
  • Ricardo Pereira Masini
For a full list of faculty members and fellows please visit the department or program website.

Program Information

Information and Departmental Plan of Study

The Undergraduate Certificate Program in Statistics and Machine Learning is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from a variety of disciplines to address numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cellular-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policy making, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. The program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and also leave them equipped to apply statistics and machine learning methods to many problems of interest.

Admission to the Program

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses.

For enrollment, please use this form: Certificate Enrollment Application(link is external)

For questions, contact us at

Program of Study

Students are required to take a total of five courses and earn at least a B-, complete the certificate’s independent work requirement, and attend CSML's annual poster session. 

Course Work:

  • One statistics course from the approved list. Student must receive at least a B- (pdf is not permitted.  Credit or exemptions for AP exams is not permitted).
  • One machine learning course from the approved list. Student must receive at least a B- (pdf not permitted).
  • Three electives from the approved list.  Student must receive at least a B- (pdf not permitted).

Students may count at most two courses from their departmental concentration toward the certificate. With permission, advanced students may be permitted to take approved graduate-level courses.

Independent Work 

Students are required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics.  This work may be used to satisfy the requirements of both the SML certificate program and the student's department of concentration. All work will be reviewed by the Statistics and Machine Learning Certificate committee. In May, there will be a public poster session at which students are required to present their work to other students, researchers and to the faculty. Students must adhere to submission due dates for independent work papers and poster requirements. Attendance for the poster session is mandatory

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus, including the CSML sponsored or co-sponsored seminars.

For a list of required courses that will count towards the certificate, please visit our website (link is external).  

Certificate of Proficiency

Students who fulfill all the program requirements will receive a certificate upon graduation.


SML 101 Reasoning with Data QCR

Data-driven decision-making, research discovery, and technology development are everywhere. It is now more important than ever for individuals to understand how data are used for these purposes. This course will introduce the student to how statistical reasoning and methods are used to learn from and leverage modern data. The emphasis will be on concepts and strategies for learning from data, rather than on sophisticated mathematics. Students will be exposed to the basics of statistics, machine learning, and data science through real world problems and applications. Students will also analyze data sets using the computer. Instructed by: Staff

SML 201 Introduction to Data Science QCR

Introduction to Data Science provides a practical introduction to the burgeoning field of data science. The course introduces students to the essential tools for conducting data-driven research, including the fundamentals of programming techniques and the essentials of statistics. Students will work with real-world datasets from various domains; write computer code to manipulate, explore, and analyze data; use basic techniques from statistics and machine learning to analyze data; learn to draw conclusions using sound statistical reasoning; and produce scientific reports. No prior knowledge of programming or statistics is required. Instructed by: Staff

SML 302 Fundamentals of Machine Learning (See COS 424)

SML 305 Mathematics for Numerical Computing and Machine Learning (See COS 302)

SML 306 Machine Learning with Social Data: Opportunities and Challenges (See SOC 306)

SML 310 Research Projects in Data Science Fall/Spring

Project-based course in which students work individually or in small teams to tackle data science and ML problems based on real datasets. We will emphasize critical thinking about experiments and large dataset analysis along with the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work; students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements. Instructed by: Staff

SML 480 Pedagogy of Data Science Spring

In this seminar, we will explore the pedagogy of introductory data science. Students in the seminar will be required to work as undergraduate course assistants in SML 201 -- Introduction to Data Science. SML 201 topics will be discussed in more depth in the seminar, with a view of teaching the basic material. We will discuss literature in the pedagogy of computer science and statistics. Discussion topics will include teaching programming using the functional programming paradigm, the design of the dplyr package, simulation-based inference, teaching statistics using simulation-based inference, the grammar of graphics, and causal inference. Instructed by: Staff