Fundamentals of inference and learning

EE-411

Media

EE-411 Fundamentals of inference and learning

14a, Crash course on ensemble methods: Bagging and Boosting

21.12.2021, 19:27

14b, Reinforcement learning

21.12.2021, 12:05

13b, Crash course on generative models

14.12.2021, 14:46

13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms

14.12.2021, 14:40

12b, Denoising, Auto-encoder, Gaussian mixture model

07.12.2021, 18:23

12a, PCA and Kernel PCA

07.12.2021, 18:22

11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA

30.11.2021, 16:39

11a, Conclusions on statistical learning theory

30.11.2021, 16:35

10b, Crash course on statistical learning theory

23.11.2021, 14:28

10a, conclusions on deep learning and intro to statistical learning theory

23.11.2021, 14:21

9b, Crash course on Deep Learning

16.11.2021, 13:43

9a, Crash course on Deep Learning

16.11.2021, 13:42

8b, Two-layer Neural Networks and Backpropagation

09.11.2021, 17:28

8a, Mercer Theorem, Kernels and Speed-up

09.11.2021, 17:14

7b, Feature maps, Representer theorem, Kernels, RKHS

02.11.2021, 15:39

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

6a, SVD, remarks on Ridge and LASSO

26.10.2021, 14:06

5b, Linear models: Ridge, OLS and LASSO

21.10.2021, 16:01

5a, Proximal gradient descent and intro linear models

21.10.2021, 15:52

4b, Gradient descent

12.10.2021, 12:44

4a, A simple Mathematical model for supervised learning with kNN (Regression case)

12.10.2021, 12:38

3b, Validation and k-NearestNeighbors method

05.10.2021, 14:06

3a, MaxL efficiency, Supervised learning intro

05.10.2021, 14:03

2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound

28.09.2021, 13:02

2a, All of Probability: LLN, CLT, Chernoff and PAC bound

28.09.2021, 12:59

1c, Lec1c All of probability: Basic bounds, LLN & CLT

21.09.2021, 23:19

1b, Lec1b All of probability: generating functionals and cumulants

21.09.2021, 22:24

1a, Lec1a All of probability: notations, and pdf

21.09.2021, 22:23

0, Course information

21.09.2021, 21:16

This short video present the course, the website, and gives generic information on the lecture

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

Media

EE-411 Fundamentals of inference and learning

14a, Crash course on ensemble methods: Bagging and Boosting

21.12.2021, 19:27

14b, Reinforcement learning

21.12.2021, 12:05

13b, Crash course on generative models

14.12.2021, 14:46

13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms

14.12.2021, 14:40

12b, Denoising, Auto-encoder, Gaussian mixture model

07.12.2021, 18:23

12a, PCA and Kernel PCA

07.12.2021, 18:22

11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA

30.11.2021, 16:39

11a, Conclusions on statistical learning theory

30.11.2021, 16:35

10b, Crash course on statistical learning theory

23.11.2021, 14:28

10a, conclusions on deep learning and intro to statistical learning theory

23.11.2021, 14:21

9b, Crash course on Deep Learning

16.11.2021, 13:43

9a, Crash course on Deep Learning

16.11.2021, 13:42

8b, Two-layer Neural Networks and Backpropagation

09.11.2021, 17:28

8a, Mercer Theorem, Kernels and Speed-up

09.11.2021, 17:14

7b, Feature maps, Representer theorem, Kernels, RKHS

02.11.2021, 15:39

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

6a, SVD, remarks on Ridge and LASSO

26.10.2021, 14:06

5b, Linear models: Ridge, OLS and LASSO

21.10.2021, 16:01

5a, Proximal gradient descent and intro linear models

21.10.2021, 15:52

4b, Gradient descent

12.10.2021, 12:44

4a, A simple Mathematical model for supervised learning with kNN (Regression case)

12.10.2021, 12:38

3b, Validation and k-NearestNeighbors method

05.10.2021, 14:06

3a, MaxL efficiency, Supervised learning intro

05.10.2021, 14:03

2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound

28.09.2021, 13:02

2a, All of Probability: LLN, CLT, Chernoff and PAC bound

28.09.2021, 12:59

1c, Lec1c All of probability: Basic bounds, LLN & CLT

21.09.2021, 23:19

1b, Lec1b All of probability: generating functionals and cumulants

21.09.2021, 22:24

1a, Lec1a All of probability: notations, and pdf

21.09.2021, 22:23

0, Course information

21.09.2021, 21:16

This short video present the course, the website, and gives generic information on the lecture

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

This file is part of the content downloaded from Fundamentals of inference and learning.

Course summary

Summary

This is an introductory course in the theory of statistics, inference, and machine learning, with an emphasis on theoretical understanding & practical exercises. The course will combine, and alternate, between mathematical theoretical foundations and practical computational aspects in python.

Content

The topics will be chosen from the following basic outline:

Statistical inference: Estimators, Bias-Variance, Consistency, Efficiency, Maximum likelihood, Fisher Information.
Bayesian inference, Priors, A posteriori estimation, Expectation-Minimization.
Supervised learning : Linear Regression, Ridge, Lasso, Sparse problems, high-dimensional Data, Kernel methods, Boosting, Bagging. K-NN, Support Vector Machines, logistic regression, Optimal Margin Classifier
Statistical learning theory: VC Bounds and Uniform convergence, Implicit regularisation, Double-descent
Unsupervised learning : Mixture Models, PCA & Kernel PCA, k-means
Deep learning: multi-layer nets, convnets, auto-encoder, Gradient-descent algorithms
Generative models: Sampling and Diffusion models

Videos of the course (from previous years) are on TubeSwitch

Codes and python notebooks are on this github link: https://github.com/IdePHICS/FundamentalLearningEPFL

Learning Outcomes

By the end of the course, the student must be able to:

Formulate statistical models and apply them to statistical learning
Apply machine learning technics to data science problems
Use gradient descent for Empirical Risk minimization
Solve concrete data science problems
Use neural networks for supervised learning problems
Explain and understand the fundamental principles of learning theory, and their current limitations

References

A good book for probability and statistics, accessible to students, is Larry A. Wasserman ‘s All of Statistics.
An accessible introduction to statistical learning is given in Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie.
Another great reference is Machine Learning:A Probabilistic Perspective by Kevin P. Murphy. MacKay’s Machine Learning:A Probabilistic Perspective Information Theory, Inference and Learning Algorithms is also a very useful ressource.
Modern Deep learning is well covered in this recent book: Dive into Deep Learning by A. Zhang, Z. Lipton, M. Li, A.J. Smola.
Un recent, et excellent, livre de reference en Francais: Introduction au Machine Learning par Chloé-Agathe Azencott.

Course Policies

Homeworks: There will be three homework assignments, each worth 20% of the final grade.
Bonus weekly (short) homeworks: Each week there will be a short problem given by the TAs that should be returned before two weeks from the TA session in which was given. These will not be compulsory, however, upon completion of all the question, a bonus of +0.25 points on the final grade shall be granted.
Projects: Projects will be done at the end of the semesters and will account for 40% of the final grade. You may work in teams of 1-5 people. There will be a limited number of projects to choose from, and you will not be able to choose other projects. Each team member’s contribution should be highlighted. You should use the project as an opportunity to “learn by doing”.
There will be no written exam.
Videos: if you miss a lecture, videos of the lecture from the year 2021 are posted on the SwichTube channel of the course.
Academic Integrity: Collaboration among students is allowed, and encouraged, but is intended to help you learn. In other words, you may work on solving assignments together, but you should always write up your solutions separately. You should always implement code alone as well. Whenever collaboration happens, it should be reported by all parties involved in the relevant homework problem.

FAQ

How can I use python on my computer?

Two good options to run python online are EPFL Noto & Google Colab. Noto is EPFL’s JupyterLab centralized platform. It allows teachers and students to use notebooks without having to install python on their computers. Google colab provides a similar solution, with the added advantage of giving access to GPUs. For instance, you can open the jupyter notebook corresponding to the first exercise by a) opening google colab in your browser) b) selecting github, and c) writing the path

https://github.com/IdePHICS/FundamentalLearningEPFL/blob/main/TP1/FoIL_ex1.ipynb

I do not know any python! What should I do?

TP0 provides a short introduction. If you need more and really need to study python, here is a a good Python and NumPy Tutorial.

What is overleaf?

If you cannot compile LaTeX on your own computer (and even if you can, this is often a good strategy anyway), EPFL is providing Overleaf Professional accounts for all students: Overleaf EPFL . With Overleaf you can write and compile LaTeX directly from your web browser.

Week 1 (9/9)

Welcome to class! This will be the first week and the topic is a recap of basic Probability notions, and then a recap on statistics. We have to start by the very beginning!

19/9 (i) All of probability video part-a,video part-b video part-c; lecture notes

This first class is a very brief recap on probability theory that will serve us well in this class. A good reference, and an absolutely recommended reading, for this lecture is Chap. 1-5 in All of statistics by Wasserman. This material should be standard for you at this point of your master studies. However,iIf you are not up to date with all these basic probability concept, you are invited to watch all videos and go through the lecture notes. It will help you a lot!

19/9 (ii) All of statistics video part-a,video part-b; lecture notes

In this second class, we shall focus on the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3.

20/9 Lab class: Introduction to statistics with python

If you are not familiar with python, it will be important to prepare yourself before the class (and to come to the lab class with your computer!). Make sure you are familiar with the following materials: A short intro to python and to visualization and making plots with Matplotlib.

The material for the first exercise session is in the TP1 folder of the Github repository. Please feel free to write on the forum if you have any questions!

17/9 (ii) All of statistics video part-a,video part-b; lecture notes

In this second class, we shall wrap up the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3.

18/9 Lab class: Maximum likelihood numerical estimation

24/09 (iii) : Supervised learning and KNN video part-a,video part-b. video part-c; lecture notes

In this third class, we shall move on an important topic, the cornerstone of modern machine learning: supervised learning! A good read on supervised statistical learning is chapter 2 in An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani. They also discuss in detail K-neareast neighbors (Knn)which we shall use a lot as well.

The mathematical model I discussed in the second part on Knn is detailed a bit more in detail the lecture notes! You are encouraged to redo these computation that explain the infamous "curse of dimensionality". This post here is also very clear!

25/09 Lab class:

In the lab class, we shall start to discuss supervised machine learning with Classification using k-NN

01/10 (iv): We shall first continue this week on Supervised learning a, video part-b. video part-c; lecture notes and breifly discuss Random Forest (video) and bagging/boosting You can find here the slides i used for random forests and boosting. Finally, we will be able to move to the real crux of supervised learning and discuss gradient descent. Gradient descent is the workhorse of all modern machine learning methods. There are many resources on gradient descent, from pedagogical ones to technical ones. Proximal operators are very powerful and are well described in this set of lectures Tibshirani1, Tibshirani2,Tibshirani3. The videos can be found here: Gradient descents video part-a,video part-b; lecture notes

02/10 Lab class: We shall move on the amazing scikit learn package that will allow you to quickly solve many practical machine learning problems! You'll find that all of our favorite methods are already implemented in there. Moreover, we will discuss also other methods such as regression trees on a fresh new dataset.

8/10 (v): Lecture notes We shall now start our journey into parametric models, a journey that will eventually take us to neural networks! But let's proceed in order and start by the easiest and simplest one: linear models: Intro to linear models and Ordinary Least Squares problem: part1, part2 Singular Value Decomposition (SVD): video

9/10 Lab class: Notebook on Gradient descent methods to optimise functions and to solve least-squares problems.

24/10 (vi): Lecture notes

Regression with Ridge and Lasso: part1, part2Classification problems: Part 1, Part 2

16/10 Lab class: Regression and classification on real data on real datasets using Scikit-Learn

Fall break

Features maps and Kernels video part-a, video part-b , video part-c lecture notes

Richers features maps that linear ones, and Kernel methods, are one of the most important aspect of supervised machine learning. Michael Jordan’s notes on kernel are a good reference. The review from Hofmann, Scholkopf and Smola is also very complete. Scikit-learn has a detailed and very efficient implementation.

Lab class (30/10): Lab class: Classification on MNIST using Logistic regression and Support vector machines

(vii) Neural networks & deep learning video part-a;video part-b;lecture notes, [slides]

Over the last decades, neural networks have made quite an impact, one might even say that they are at the origin of a revolution in machine learning and artificial intelligence. This simple website allows you to get intuition on how they actually work for simple dataset: Tensorflow playground. The universal approximation theorem is discussed in many references (see for instance here). Despite Backpropagation being a rather trivial application of the chain rule of derivatives from Newton and Liebnitz notes, it is the cornerstone of learning neural networks.

Lab class (15/11): Kernels and backpropagations

(vii-bis) deep learning video part-b;video part-c;video part-d; slides;lecture notes

A good summary of gradient decent algorithms is here. Convnets have made quite an impact, and have revolutionized computer vision, see the nice introduction by Yann Lecun.

Lab class: Pytorch and CNN

(viii) A short course on Statistical Learning Theory video part-a; video part-b;video part-c, lecture notes

There are many ressource on the topic online, and many books on this topic, which would deserve an entire course in its own. Nevertheless, it is good and useful to have a basic understanding of where we stand theoretically and to have grasp of the notion of VC dimension.

(We also started the next week early: here are the lecture note for session 9:)

Lab class: Deep Learning Tips and Tricks: link

(ix) Unsupervised learning and dimensionality reduction video part-a; video partb-b; video part-c; lecture notes

Principal Component Analysis is (still) one of the most fundamental tool of machine learning. This post has great visual examples, that you can play with to get an intuition.

* Lab class : PCA here

This week we shall finish the unsupervised learning part (autoencoders and denoising), and then move on to

(x) Generative model and clustering video part-a, lecture notes

Scikit learn has a good implemenation of k-means. Generative models are fundamental part of machine learning. The connection between Mixture of Gaussians and k-means clustering is well explained in David MacKay’s book page 300. The book is a very useful reference on this topic and probability in general (for instance Monte-Carlo methods adiscussed page 357). Boltzmann machines are discussed in in many places, for instances here and there. Generative Adversarial networks are very fashionable these days (check out This Person does not exists!). An introduction in pytorch is available here.

Lab class: Denoising Autoencoders link

Everything else Slides.video part-a,video part-b, lecture notes

We talk breifly about generativ emodels, here are the slides

Lab Class: In the lab class, we will go back to supervised learning to learn few tricks we did not have time to see: Data Augmentation and Transfer Learning. link

Today we briefly discuss everything we have not done in the lecture: Times series (RNN LSTM and Tranformers)

RNN are still very useful (even though these daysit is all about transformers). We used extensivly the following introduction. A simple RNN implementaion for learning to add number in keras is given here.

Reiniforcement learning is certainly one of the most interesting directions. You can find a simple implementation of q-learning herefor Frozen lake and of policy gradient for cartpole. The nature paper on alpha go is a fascinating read on the new era of reinforcement learning.

Lab class: We present the projects and clarify all your questions about them.