Introduction

This course gives an overview of the mathematical foundations for commonly used techniques in data science and machine learning. Topics include principal component analysis (PCA), k-means clustering, support vector machines, k-nearest neighbor classification, graph-based learning, deep neural networks, and optimization. The course will cover these topics from a rigorous mathematical perspective, using mathematical tools from linear algebra and multivariable calculus.

The figure below shows a t-SNE embedding of the MNIST dataset of handwritten digits, which gives a way to visualize high dimensional data. We will learn about several different techniques for data embedding/visualization, including t-SNE, in the course.

The course will cover both mathematical theory and practical applications. We will use Python for all computational work in this course. Students will get hands on experience working with real data through a series small projects that will be completed throughout the term as part of homework assignments. We will start the course with a gentle introduction to Python; no prior knowledge is required. See the Homework page for details about Python.

Note: It is important to bring a laptop to class every day to complete programming exercises during class.

Prerequisites are linear algebra (for example MATH 2142, 2243 or 2373) and multivariable calculus (for example MATH 2263 or 2374). Although the course is aimed at undergraduate mathematics majors, other undergraduate majors and graduate students from mathematics and other departments are welcome. Students will be evaluated through bi-weekly homework assignments and a final Python project.

Course Information (.pdf)


Instructor Jeff Calder (Office: 538 Vincent Hall, Email: jwcalder@umn.edu)
Lectures Mon and Wed, 1pm-2:40pm in Vincent 6
Office Hours Monday 2:45pm-4:00pm, and Wednesday 11:00am-12:30pm in Vincent 538
Piazza This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the instructor, I encourage you to post your questions on Piazza. You can access Piazza through Canvas.
Canvas We have a Canvas website , which will only be used for posting grades (and also note that Piazza is integrated into Canvas now). This public website will be used to post all course material.
Lecture Notes The course will be taught from the lecture notes below, which will be updated throughtout the term.

Calder, J. & Olver, P.J. Linear Algebra, Machine Learning, and Data Science [PDF].
Homework There will be 7 homework assignments. Homework problems will include basic computations, mathematical proofs, and some small Python exercises and projects. Collaboration on homework is strongly encouraged, but you are expected to write up solutions, including Python code, on your own. Please indicate the names of any other students you collaborated with on your homework.
Final Project Students will work on a group project throughout the semester. The project will involve an application of machine learning to a real-world dataset, and writing a short paper summarizing the problem and results. The due date for the project is the last day of exams, May 10.
Grades Your final grade will be based on homework assignments (70%) and the final project (30%). The lowest two homework scores will be dropped at the end of the semester for each student. Important Note: There is no grading curve in the class. All grades are assigned on a 12-point scale with 12=A, 11=A-, 10=B+, etc.
Readings Readings will be assigned on a weekly basis and posted on the schedule page on this website. It is very important to do the readings before attending the associated lecture. Unless otherwise noted, readings are from the class lecture notes.
Policy for late work Each student has 3 tokens to use throughout the semester. Each token allows for a one-day extension on an assignment. If more time is needed due to extenuating circumstances, please contact me immediately to make arrangements.
Academic Honesty The School of Mathematics at the University of Minnesota expects that students in mathematics courses will not engage in cheating or plagiarism. Cheating, plagiarism, and other forms of academic dishonesty will result in a grade of zero on the homework assignment or exam in question, and, in severe cases, a failing grade in the course and a referral to the Office for Student Conduct and Academic Integrity (OSCAI). Students should be familiar with the Student Code of Conduct.