Fundamentals of inference and learning
EE-411
Media
EE-411 Fundamentals of inference and learning
14a, Crash course on ensemble methods: Bagging and Boosting
21.12.2021, 19:27
14b, Reinforcement learning
21.12.2021, 12:05
13b, Crash course on generative models
14.12.2021, 14:46
13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms
14.12.2021, 14:40
12b, Denoising, Auto-encoder, Gaussian mixture model
07.12.2021, 18:23
12a, PCA and Kernel PCA
07.12.2021, 18:22
11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA
30.11.2021, 16:39
11a, Conclusions on statistical learning theory
30.11.2021, 16:35
10b, Crash course on statistical learning theory
23.11.2021, 14:28
10a, conclusions on deep learning and intro to statistical learning theory
23.11.2021, 14:21
9b, Crash course on Deep Learning
16.11.2021, 13:43
9a, Crash course on Deep Learning
16.11.2021, 13:42
8b, Two-layer Neural Networks and Backpropagation
09.11.2021, 17:28
8a, Mercer Theorem, Kernels and Speed-up
09.11.2021, 17:14
7b, Feature maps, Representer theorem, Kernels, RKHS
02.11.2021, 15:39
7a, SVMs and Introduction to Feature Maps
02.11.2021, 14:54
6b, Classification problems: general overview, remarks on hinge and logistic loss
26.10.2021, 14:08
6a, SVD, remarks on Ridge and LASSO
26.10.2021, 14:06
5b, Linear models: Ridge, OLS and LASSO
21.10.2021, 16:01
5a, Proximal gradient descent and intro linear models
21.10.2021, 15:52
4b, Gradient descent
12.10.2021, 12:44
4a, A simple Mathematical model for supervised learning with kNN (Regression case)
12.10.2021, 12:38
3b, Validation and k-NearestNeighbors method
05.10.2021, 14:06
3a, MaxL efficiency, Supervised learning intro
05.10.2021, 14:03
2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound
28.09.2021, 13:02
2a, All of Probability: LLN, CLT, Chernoff and PAC bound
28.09.2021, 12:59
1c, Lec1c All of probability: Basic bounds, LLN & CLT
21.09.2021, 23:19
1b, Lec1b All of probability: generating functionals and cumulants
21.09.2021, 22:24
1a, Lec1a All of probability: notations, and pdf
21.09.2021, 22:23
0, Course information
21.09.2021, 21:16
This short video present the course, the website, and gives generic information on the lecture
7a, SVMs and Introduction to Feature Maps
02.11.2021, 14:54
6b, Classification problems: general overview, remarks on hinge and logistic loss
26.10.2021, 14:08
Media
EE-411 Fundamentals of inference and learning
14a, Crash course on ensemble methods: Bagging and Boosting
21.12.2021, 19:27
14b, Reinforcement learning
21.12.2021, 12:05
13b, Crash course on generative models
14.12.2021, 14:46
13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms
14.12.2021, 14:40
12b, Denoising, Auto-encoder, Gaussian mixture model
07.12.2021, 18:23
12a, PCA and Kernel PCA
07.12.2021, 18:22
11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA
30.11.2021, 16:39
11a, Conclusions on statistical learning theory
30.11.2021, 16:35
10b, Crash course on statistical learning theory
23.11.2021, 14:28
10a, conclusions on deep learning and intro to statistical learning theory
23.11.2021, 14:21
9b, Crash course on Deep Learning
16.11.2021, 13:43
9a, Crash course on Deep Learning
16.11.2021, 13:42
8b, Two-layer Neural Networks and Backpropagation
09.11.2021, 17:28
8a, Mercer Theorem, Kernels and Speed-up
09.11.2021, 17:14
7b, Feature maps, Representer theorem, Kernels, RKHS
02.11.2021, 15:39
7a, SVMs and Introduction to Feature Maps
02.11.2021, 14:54
6b, Classification problems: general overview, remarks on hinge and logistic loss
26.10.2021, 14:08
6a, SVD, remarks on Ridge and LASSO
26.10.2021, 14:06
5b, Linear models: Ridge, OLS and LASSO
21.10.2021, 16:01
5a, Proximal gradient descent and intro linear models
21.10.2021, 15:52
4b, Gradient descent
12.10.2021, 12:44
4a, A simple Mathematical model for supervised learning with kNN (Regression case)
12.10.2021, 12:38
3b, Validation and k-NearestNeighbors method
05.10.2021, 14:06
3a, MaxL efficiency, Supervised learning intro
05.10.2021, 14:03
2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound
28.09.2021, 13:02
2a, All of Probability: LLN, CLT, Chernoff and PAC bound
28.09.2021, 12:59
1c, Lec1c All of probability: Basic bounds, LLN & CLT
21.09.2021, 23:19
1b, Lec1b All of probability: generating functionals and cumulants
21.09.2021, 22:24
1a, Lec1a All of probability: notations, and pdf
21.09.2021, 22:23
0, Course information
21.09.2021, 21:16
This short video present the course, the website, and gives generic information on the lecture
7a, SVMs and Introduction to Feature Maps
02.11.2021, 14:54
6b, Classification problems: general overview, remarks on hinge and logistic loss
26.10.2021, 14:08
Summary
This is an introductory course in the theory of statistics, inference, and machine learning, with an emphasis on theoretical understanding & practical exercises. The course will combine, and alternate, between mathematical theoretical foundations and practical computational aspects in python.
Content
The topics will be chosen from the following basic outline:
- Statistical inference: Estimators, Bias-Variance, Consistency, Efficiency, Maximum likelihood, Fisher Information.
- Bayesian inference, Priors, A posteriori estimation, Expectation-Minimization.
- Supervised learning : Linear Regression, Ridge, Lasso, Sparse problems, high-dimensional Data, Kernel methods, Boosting, Bagging. K-NN, Support Vector Machines, logistic regression, Optimal Margin Classifier
- Statistical learning theory: VC Bounds and Uniform convergence, Implicit regularisation, Double-descent
- Unsupervised learning : Mixture Models, PCA & Kernel PCA, k-means
- Deep learning: multi-layer nets, convnets, auto-encoder, Gradient-descent algorithms
- Generative models: Sampling and Diffusion models
Videos of the course (from previous years) are on TubeSwitch
Codes and python notebooks are on this github link: https://github.com/IdePHICS/FundamentalLearningEPFL
Learning Outcomes
By the end of the course, the student must be able to:
- Formulate statistical models and apply them to statistical learning
- Apply machine learning technics to data science problems
- Use gradient descent for Empirical Risk minimization
- Solve concrete data science problems
- Use neural networks for supervised learning problems
- Explain and understand the fundamental principles of learning theory, and their current limitations
References
- A good book for probability and statistics, accessible to students, is Larry A. Wasserman ‘s All of Statistics.
- An accessible introduction to statistical learning is given in Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie.
- Another great reference is Machine Learning:A Probabilistic Perspective by Kevin P. Murphy. MacKay’s Machine Learning:A Probabilistic Perspective Information Theory, Inference and Learning Algorithms is also a very useful ressource.
- Modern Deep learning is well covered in this recent book: Dive into Deep Learning by A. Zhang, Z. Lipton, M. Li, A.J. Smola.
- Un recent, et excellent, livre de reference en Francais: Introduction au Machine Learning par Chloé-Agathe Azencott.
Course Policies
- Homeworks: There will be three homework assignments, each worth 20% of the final grade.
- Bonus weekly (short) homeworks: Each week there will be a short problem given by the TAs that should be returned before two weeks from the TA session in which was given. These will not be compulsory, however, upon completion of all the question, a bonus of +0.25 points on the final grade shall be granted.
- Projects: Projects will be done at the end of the semesters and will account for 40% of the final grade. You may work in teams of 1-5 people. There will be a limited number of projects to choose from, and you will not be able to choose other projects. Each team member’s contribution should be highlighted. You should use the project as an opportunity to “learn by doing”.
- There will be no written exam.
- Videos: if you miss a lecture, videos of the lecture from the year 2021 are posted on the SwichTube channel of the course.
- Academic Integrity: Collaboration among students is allowed, and encouraged, but is intended to help you learn. In other words, you may work on solving assignments together, but you should always write up your solutions separately. You should always implement code alone as well. Whenever collaboration happens, it should be reported by all parties involved in the relevant homework problem.
FAQ
- How can I use python on my computer?
Two good options to run python online are EPFL Noto & Google Colab. Noto is EPFL’s JupyterLab centralized platform. It allows teachers and students to use notebooks without having to install python on their computers. Google colab provides a similar solution, with the added advantage of giving access to GPUs. For instance, you can open the jupyter notebook corresponding to the first exercise by a) opening google colab in your browser) b) selecting github, and c) writing the path
- I do not know any python! What should I do?
TP0 provides a short introduction. If you need more and really need to study python, here is a a good Python and NumPy Tutorial.
- What is overleaf?
If you cannot compile LaTeX on your own computer (and even if you can, this is often a good strategy anyway), EPFL is providing Overleaf Professional accounts for all students: Overleaf EPFL . With Overleaf you can write and compile LaTeX directly from your web browser.
Week 1 (9/9)
Welcome to class! This will be the first week and the topic is a recap of basic Probability notions, and then a recap on statistics. We have to start by the very beginning!
- 19/9 (i) All of probability video part-a,video part-b video part-c; lecture notes
This first class is a very brief recap on probability theory that will serve us well in this class. A good reference, and an absolutely recommended reading, for this lecture is Chap. 1-5 in All of statistics by Wasserman. This material should be standard for you at this point of your master studies. However,iIf you are not up to date with all these basic probability concept, you are invited to watch all videos and go through the lecture notes. It will help you a lot!
- 19/9 (ii) All of statistics video part-a,video part-b; lecture notes
- 20/9 Lab class: Introduction to statistics with python
If you are not familiar with python, it will be important to prepare yourself before the class (and to come to the lab class with your computer!). Make sure you are familiar with the following materials: A short intro to python and to visualization and making plots with Matplotlib.
The material for the first exercise session is in the TP1 folder of the Github repository. Please feel free to write on the forum if you have any questions!
- 17/9 (ii) All of statistics video part-a,video part-b; lecture notes
- 18/9 Lab class: Maximum likelihood numerical estimation
- 24/09 (iii) : Supervised learning and KNN video part-a,video part-b. video part-c; lecture notes
The mathematical model I discussed in the second part on Knn is detailed a bit more in detail the lecture notes! You are encouraged to redo these computation that explain the infamous "curse of dimensionality". This post here is also very clear!
- 25/09 Lab class:
- 01/10 (iv): We shall first continue this week on Supervised learning a, video part-b. video part-c; lecture notes and breifly discuss Random Forest (video) and bagging/boosting You can find here the slides i used for random forests and boosting. Finally, we will be able to move to the real crux of supervised learning and discuss gradient descent. Gradient descent is the workhorse of all modern machine learning methods. There are many resources on gradient descent, from pedagogical ones to technical ones. Proximal operators are very powerful and are well described in this set of lectures Tibshirani1, Tibshirani2,Tibshirani3. The videos can be found here: Gradient descents video part-a,video part-b; lecture notes
- 02/10 Lab class: We shall move on the amazing scikit learn package that will allow you to quickly solve many practical machine learning problems! You'll find that all of our favorite methods are already implemented in there. Moreover, we will discuss also other methods such as regression trees on a fresh new dataset.
- 8/10 (v): Lecture notes We shall now start our journey into parametric models, a journey that will eventually take us to neural networks! But let's proceed in order and start by the easiest and simplest one: linear models: Intro to linear models and Ordinary Least Squares problem: part1, part2 Singular Value Decomposition (SVD): video
- 9/10 Lab class: Notebook on Gradient descent methods to optimise functions and to solve least-squares problems.
- 24/10 (vi): Lecture notes
- 16/10 Lab class: Regression and classification on real data on real datasets using Scikit-Learn
- Fall break
- Features maps and Kernels video part-a, video part-b , video part-c lecture notes
Richers features maps that linear ones, and Kernel methods, are one of the most important aspect of supervised machine learning. Michael Jordan’s notes on kernel are a good reference. The review from Hofmann, Scholkopf and Smola is also very complete. Scikit-learn has a detailed and very efficient implementation.
- Lab class (30/10): Lab class: Classification on MNIST using Logistic regression and Support vector machines
- (vii) Neural networks & deep learning video part-a;video part-b;lecture notes, [slides]
Over the last decades, neural networks have made quite an impact, one might even say that they are at the origin of a revolution in machine learning and artificial intelligence. This simple website allows you to get intuition on how they actually work for simple dataset: Tensorflow playground. The universal approximation theorem is discussed in many references (see for instance here). Despite Backpropagation being a rather trivial application of the chain rule of derivatives from Newton and Liebnitz notes, it is the cornerstone of learning neural networks.
- Lab class (15/11): Kernels and backpropagations
- (vii-bis) deep learning video part-b;video part-c;video part-d; slides;lecture notes
A good summary of gradient decent algorithms is here. Convnets have made quite an impact, and have revolutionized computer vision, see the nice introduction by Yann Lecun.
- Lab class: Pytorch and CNN
- (viii) A short course on Statistical Learning Theory video part-a; video part-b;video part-c, lecture notes
There are many ressource on the topic online, and many books on this topic, which would deserve an entire course in its own. Nevertheless, it is good and useful to have a basic understanding of where we stand theoretically and to have grasp of the notion of VC dimension.
(We also started the next week early: here are the lecture note for session 9:)
- Lab class: Deep Learning Tips and Tricks: link
- (ix) Unsupervised learning and dimensionality reduction video part-a; video partb-b; video part-c; lecture notes
Principal Component Analysis is (still) one of the most fundamental tool of machine learning. This post has great visual examples, that you can play with to get an intuition.
* Lab class : PCA here
This week we shall finish the unsupervised learning part (autoencoders and denoising), and then move on to
- (x) Generative model and clustering video part-a, lecture notes
Scikit learn has a good implemenation of k-means. Generative models are fundamental part of machine learning. The connection between Mixture of Gaussians and k-means clustering is well explained in David MacKay’s book page 300. The book is a very useful reference on this topic and probability in general (for instance Monte-Carlo methods adiscussed page 357). Boltzmann machines are discussed in in many places, for instances here and there. Generative Adversarial networks are very fashionable these days (check out This Person does not exists!). An introduction in pytorch is available here.
- Lab class: Denoising Autoencoders link
- Everything else Slides.video part-a,video part-b, lecture notes
We talk breifly about generativ emodels, here are the slides
Lab Class: In the lab class, we will go back to supervised learning to learn few tricks we did not have time to see: Data Augmentation and Transfer Learning. link
Today we briefly discuss everything we have not done in the lecture: Times series (RNN LSTM and Tranformers)
RNN are still very useful (even though these daysit is all about transformers). We used extensivly the following introduction. A simple RNN implementaion for learning to add number in keras is given here.
Reiniforcement learning is certainly one of the most interesting directions. You can find a simple implementation of q-learning herefor Frozen lake and of policy gradient for cartpole. The nature paper on alpha go is a fascinating read on the new era of reinforcement learning.
Lab class: We present the projects and clarify all your questions about them.