Statistics and Machine Learning

Overview

The Undergraduate Minor Program in Statistics and Machine Learning is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from a variety of disciplines to address numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cellular-biological processes, pricing financial assets, evaluating the efficacy of public policy programs and forecasting election outcomes. In addition to its importance in scientific research and policymaking, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. The program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and also leave them equipped to apply statistics and machine learning methods to many problems of interest.

Program Offerings

Offering type
Minor

Enrolled students will learn the basic principles of statistics and machine learning and how to apply these methods to data-driven problems. This requires students to master core conceptual and theoretical frameworks, a selection of core methods and best practices for sound data analysis.

A minor in statistics and machine learning has the potential to complement a wide variety of majors. Statistics and machine learning methods play an essential role across all fields where data are critical for principled knowledge discovery. The training provided by the minor will enhance students' ability to contribute new approaches and knowledge to their major field.

Goals for Student Learning

Students will learn basic conceptual and theoretical frameworks, best practices and a set of tools, which together equip them to correctly apply statistics and machine learning methods in various domains. This knowledge will enhance the student academic experience by opening pathways for students to apply well-grounded data analysis methods in their senior independent work, theses and also beyond Princeton as they navigate their way through an increasingly data-driven world.

Prerequisites

These are not hard prerequisites for the minor, but may be required for some of the courses in the minor and are thus recommended for students who intend to apply. We recommend that students plan what courses they intend to take in the minor and aim to fulfill the prerequisites of those courses.

Coding

(Normally completed by the end of the sophomore year)

COS 126/POL 345/SML 201

COS 126 provides comprehensive coverage of coding principles. A student can also learn coding in R(1) within the narrower contexts of statistics (POL 345) or data science (SML 201). It is recommended that all students continue to hone their coding by learning Python(1) through co-curricular coding courses.

Mathematics

(Normally completed before the spring semester of the junior year)

Linear Algebra: MAT 202/EGR 154/SML 305(2)(3)

Calculus: MAT 201 or (MAT 103, then SML 305)

Probability

Taking one of the approved core statistics courses provides a basic understanding of probability. An additional course from ORF 309/SML 305 is strongly recommended for students interested in advanced machine learning courses.

Notes:

(1) Python and R are the most frequent coding languages in statistics and machine learning.

(2) SML 305 covers the key aspects of linear algebra, differential calculus, and probability, most relevant to statistics and machine learning courses.

(3) SML 305 will not count toward the minor program.

Admission to the Program

Students are encouraged to enroll in the spring of their sophomore year, but no later than the start of their senior year.

For enrollment in the minor program, it is required to have your major declared beforehand.

Please use this form to enroll: Minor Enrollment Application

For questions, contact us at [email protected].

Program of Study

Students must take five courses from approved lists and earn a grade of B- or better in each course (pass/D/fail or advanced placement are not allowed). With permission, advanced students can take approved graduate-level courses.

Required Coursework

  • One statistics course from an approved list.
  • One machine learning course from an approved list.
  • Three additional courses from the approved list of elective courses or (with approval) additional non-cognate courses from the statistics and machine learning approved lists.

Students may count a maximum of two courses from their major toward the minor.

Independent Work

Students are required to complete at least one semester of independent work in their junior or senior year on a topic that applies SML methods or investigates these methods. This work may be used to satisfy the IW requirement of the SML minor and the student's major. All work will be reviewed by the Statistics and Machine Learning Minor committee. In May, there will be an (online) poster session in which students must present their independent work to other students, researchers and the faculty. Students must adhere to submission due dates for independent work papers and poster requirements.

Students are encouraged to attend the CSML-sponsored or co-sponsored colloquia and seminars.

Independent Work

Students are required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics. Typically, this is achieved in one of two ways: via applied work or via a core methodological contribution.

Applied projects should tackle some domain of intellectual interest in science, engineering, the humanities, etc. The project should use machine learning or statistical methods in a nontrivial way to analyze the data or in support of an engineering goal. The project report should go into detail about what these methods were and how they were used.

This work may be used to satisfy the requirements of both the SML minor program and the student's major. All work will be reviewed by the Statistics and Machine Learning Minor committee. In May, students are required to submit their independent work paper, poster and a brief video of their work. Students must adhere to submission due dates for independent work papers and poster requirements. 

 

Study Abroad

Students in the SML undergraduate minor can utilize up to two non-Princeton/study abroad courses for the SML minor, with proper approval.

An equivalent course needs to be offered at Princeton University and that course needs to be on the SML approved course list. It is the student’s responsibility to obtain approval and sign-off from the faculty member teaching that course. SML cannot sign off on a course that is not our course. Our courses are SML 201, 310, 312, etc.

Once you receive approval and the signature from the appropriate faculty member, please forward the signed form to [email protected] for the CSML program director to sign off.

Please plan accordingly and allow enough time to obtain these signatures and approvals.
 
Example: You would like to take a course at another university that is comparable to ORF 245 Fundamentals of Statistics. The faculty member teaching ORF 245 must sign off since they are the one who can verify the course is similar to the one they teach. Once that faculty signs off, email the signed forms to [email protected] for the CSML program director’s signature.
 
Example: You would like to take a course at another university and there is not an equivalent course offered at Princeton University. SML cannot approve the course and it cannot be used for the SML minor.

Faculty

  • Director

    • Sarah-Jane Leslie
  • Executive Committee

    • Ryan P. Adams, Computer Science, ex officio
    • Sarah-Jane Leslie, Philosophy
    • Peter M. Melchior, Astrophysical Sciences
    • Brandon M. Stewart, Sociology
    • Ellen Zhong, Computer Science
  • Associated Faculty

    • Sigrid M. Adriaenssens, Civil and Environmental Eng
    • Amir Ali Ahmadi, Oper Res and Financial Eng
    • Sanjeev Arora, Computer Science
    • Yacine Aït-Sahalia, Economics
    • Matias D. Cattaneo, Oper Res and Financial Eng
    • Danqi Chen, Computer Science
    • Jonathan D. Cohen, Psychology
    • Jia Deng, Computer Science
    • Jianqing Fan, Oper Res and Financial Eng
    • Jaime Fernandez Fisac, Electrical & Comp Engineering
    • Filiz Garip, Sociology
    • Tom Griffiths, Psychology
    • Boris Hanin, Oper Res and Financial Eng
    • Elad Hazan, Computer Science
    • Bo E. Honoré, Economics
    • Niraj K. Jha, Electrical & Comp Engineering
    • Chi Jin, Electrical & Comp Engineering
    • Jason Matthew Klusowski, Oper Res and Financial Eng
    • Michal Kolesár, Economics
    • Sanjeev R. Kulkarni, Electrical & Comp Engineering
    • Jason D. Lee, Electrical & Comp Engineering
    • Naomi E. Leonard, Mechanical & Aerospace Eng
    • Sarah-Jane Leslie, Philosophy
    • John B. Londregan, Schl of Public & Int'l Affairs
    • Anirudha Majumdar, Mechanical & Aerospace Eng
    • William A. Massey, Oper Res and Financial Eng
    • Reed M. Maxwell, Civil and Environmental Eng
    • Peter M. Melchior, Astrophysical Sciences
    • Ulrich K. Mueller, Economics
    • Karthik Narasimhan, Computer Science
    • Jonathan W. Pillow, Psychology
    • H. Vincent Poor, Electrical & Comp Engineering
    • Yuri Pritykin, Computer Science
    • Olga Russakovsky, Computer Science
    • Matthew J. Salganik, Sociology
    • Amit Singer, Mathematics
    • Mona Singh, Computer Science
    • Bartolomeo Stellato, Oper Res and Financial Eng
    • Brandon M. Stewart, Sociology
    • John D. Storey, Integrative Genomics
    • Michael A. Strauss, Astrophysical Sciences
    • Rocío Titiunik, Politics
    • Jeroen Tromp, Geosciences
    • Olga G. Troyanskaya, Computer Science
    • Mark W. Watson, Schl of Public & Int'l Affairs
    • Michael A. Webb, Chemical and Biological Eng

For a full list of faculty members and fellows please visit the department or program website.

Courses

COS 302 - Mathematics for Numerical Computing and Machine Learning (also ECE 305/SML 305) Fall

This course provides a comprehensive and practical background for students interested in continuous mathematics for computer science. The goal is to prepare students for higher-level subjects in artificial intelligence, machine learning, computer vision, natural language processing, graphics, and other topics that require numerical computation. This course is intended students who wish to pursue these more advanced topics, but who have not taken (or do not feel comfortable) with university-level multivariable calculus (e.g., MAT 201/203) and probability (e.g., ORF 245 or ORF 309). E. Zhong

COS 424 - Fundamentals of Machine Learning (also SML 302) Not offered this year

Computers have made it possible to collect vast amounts of data from a wide variety of sources. It is not always clear, however, how to use the data, and how to extract useful information from them. This problem is faced in a tremendous range of social, economic and scientific applications. The focus will be on some of the most useful approaches to the problem of analyzing large complex data sets, exploring both theoretical foundations and practical applications. Students will gain experience analyzing several types of data, including text, images, and biological data. Two 90-minute lectures. Prereq: MAT 202 and COS 126 or equivalent. Staff