Fundamentals of inference and learning

EE-411

Media

EE-411 Fundamentals of inference and learning

14a, Crash course on ensemble methods: Bagging and Boosting

21.12.2021, 19:27

14b, Reinforcement learning

21.12.2021, 12:05

13b, Crash course on generative models

14.12.2021, 14:46

13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms

14.12.2021, 14:40

12b, Denoising, Auto-encoder, Gaussian mixture model

07.12.2021, 18:23

12a, PCA and Kernel PCA

07.12.2021, 18:22

11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA

30.11.2021, 16:39

11a, Conclusions on statistical learning theory

30.11.2021, 16:35

10b, Crash course on statistical learning theory

23.11.2021, 14:28

10a, conclusions on deep learning and intro to statistical learning theory

23.11.2021, 14:21

9b, Crash course on Deep Learning

16.11.2021, 13:43

9a, Crash course on Deep Learning

16.11.2021, 13:42

8b, Two-layer Neural Networks and Backpropagation

09.11.2021, 17:28

8a, Mercer Theorem, Kernels and Speed-up

09.11.2021, 17:14

7b, Feature maps, Representer theorem, Kernels, RKHS

02.11.2021, 15:39

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

6a, SVD, remarks on Ridge and LASSO

26.10.2021, 14:06

5b, Linear models: Ridge, OLS and LASSO

21.10.2021, 16:01

5a, Proximal gradient descent and intro linear models

21.10.2021, 15:52

4b, Gradient descent

12.10.2021, 12:44

4a, A simple Mathematical model for supervised learning with kNN (Regression case)

12.10.2021, 12:38

3b, Validation and k-NearestNeighbors method

05.10.2021, 14:06

3a, MaxL efficiency, Supervised learning intro

05.10.2021, 14:03

2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound

28.09.2021, 13:02

2a, All of Probability: LLN, CLT, Chernoff and PAC bound

28.09.2021, 12:59

1c, Lec1c All of probability: Basic bounds, LLN & CLT

21.09.2021, 23:19

1b, Lec1b All of probability: generating functionals and cumulants

21.09.2021, 22:24

1a, Lec1a All of probability: notations, and pdf

21.09.2021, 22:23

0, Course information

21.09.2021, 21:16

This short video present the course, the website, and gives generic information on the lecture

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08


Media

EE-411 Fundamentals of inference and learning

14a, Crash course on ensemble methods: Bagging and Boosting

21.12.2021, 19:27

14b, Reinforcement learning

21.12.2021, 12:05

13b, Crash course on generative models

14.12.2021, 14:46

13a, Gaussian Mixture Clustering: Expectation-Maximization algorithms

14.12.2021, 14:40

12b, Denoising, Auto-encoder, Gaussian mixture model

07.12.2021, 18:23

12a, PCA and Kernel PCA

07.12.2021, 18:22

11b, Unsupervised learning: Young-Eckart-Mirsky theorem and intro to PCA

30.11.2021, 16:39

11a, Conclusions on statistical learning theory

30.11.2021, 16:35

10b, Crash course on statistical learning theory

23.11.2021, 14:28

10a, conclusions on deep learning and intro to statistical learning theory

23.11.2021, 14:21

9b, Crash course on Deep Learning

16.11.2021, 13:43

9a, Crash course on Deep Learning

16.11.2021, 13:42

8b, Two-layer Neural Networks and Backpropagation

09.11.2021, 17:28

8a, Mercer Theorem, Kernels and Speed-up

09.11.2021, 17:14

7b, Feature maps, Representer theorem, Kernels, RKHS

02.11.2021, 15:39

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08

6a, SVD, remarks on Ridge and LASSO

26.10.2021, 14:06

5b, Linear models: Ridge, OLS and LASSO

21.10.2021, 16:01

5a, Proximal gradient descent and intro linear models

21.10.2021, 15:52

4b, Gradient descent

12.10.2021, 12:44

4a, A simple Mathematical model for supervised learning with kNN (Regression case)

12.10.2021, 12:38

3b, Validation and k-NearestNeighbors method

05.10.2021, 14:06

3a, MaxL efficiency, Supervised learning intro

05.10.2021, 14:03

2b, All of Statistics: Maximum Likelihood, MSE, Fisher Information, Cramér-Rao bound

28.09.2021, 13:02

2a, All of Probability: LLN, CLT, Chernoff and PAC bound

28.09.2021, 12:59

1c, Lec1c All of probability: Basic bounds, LLN & CLT

21.09.2021, 23:19

1b, Lec1b All of probability: generating functionals and cumulants

21.09.2021, 22:24

1a, Lec1a All of probability: notations, and pdf

21.09.2021, 22:23

0, Course information

21.09.2021, 21:16

This short video present the course, the website, and gives generic information on the lecture

7a, SVMs and Introduction to Feature Maps

02.11.2021, 14:54

6b, Classification problems: general overview, remarks on hinge and logistic loss

26.10.2021, 14:08


This file is part of the content downloaded from Fundamentals of inference and learning.
Course summary

Summary


This is an introductory course in the theory of statistics, inference, and machine learning, with an emphasis on theoretical understanding & practical exercises. The course will combine, and alternate, between mathematical theoretical foundations and practical computational aspects in python.

Content

The topics will be chosen from the following basic outline:

  • Statistical inference: Estimators, Bias-Variance, Consistency, Efficiency, Maximum likelihood, Fisher Information.
  • Bayesian inference, Priors, A posteriori estimation, Expectation-Minimization.
  • Supervised learning : Linear Regression, Ridge, Lasso, Sparse problems, high-dimensional Data, Kernel methods, Boosting, Bagging. K-NN, Support Vector Machines, logistic regression, Optimal Margin Classifier
  • Statistical learning theory: VC Bounds and Uniform convergence, Implicit regularisation, Double-descent
  • Unsupervised learning : Mixture Models, PCA & Kernel PCA, k-means
  • Deep learning: multi-layer nets, convnets, auto-encoder, Gradient-descent algorithms
  • Generative models: Sampling and Diffusion models 

Videos of the course (from previous years) are on TubeSwitch

Codes and python notebooks are on this github link: https://github.com/IdePHICS/FundamentalLearningEPFL

Learning Outcomes

By the end of the course, the student must be able to:

  • Formulate statistical models and apply them to statistical learning
  • Apply machine learning technics to data science problems
  • Use gradient descent for Empirical Risk minimization
  • Solve concrete data science problems
  • Use neural networks for supervised learning problems
  • Explain and understand the fundamental principles of learning theory, and their current limitations

References


Course Policies

  • Homeworks: There will be three homework assignments, each worth 20% of the final grade. 
  • Bonus weekly (short) homeworks: Each week there will be a short problem given by the TAs that should be returned before two weeks from the TA session in which was given. These will not be compulsory, however, upon completion of all the question, a bonus of +0.25 points on the final grade shall be granted.
  • Projects: Projects will be done at the end of the semesters and will account for 40% of the final grade. You may work in teams of 1-5 people. There will be a limited number of projects to choose from, and you will not be able to choose other projects. Each team member’s contribution should be highlighted. You should use the project as an opportunity to “learn by doing”.
  • There will be no written exam.
  • Videos: if you miss a lecture, videos of the lecture from the year 2021 are posted on the SwichTube channel of the course.
  • Academic Integrity: Collaboration among students is allowed, and encouraged, but is intended to help you learn. In other words, you may work on solving assignments together, but you should always write up your solutions separately. You should always implement code alone as well. Whenever collaboration happens, it should be reported by all parties involved in the relevant homework problem.

FAQ

  • How can I use python on my computer?

Two good options to run python online are EPFL Noto & Google Colab. Noto is EPFL’s JupyterLab centralized platform. It allows teachers and students to use notebooks without having to install python on their computers. Google colab provides a similar solution, with the added advantage of giving access to GPUs. For instance, you can open the jupyter notebook corresponding to the first exercise by a) opening google colab in your browser) b) selecting github, and c) writing the path 

https://github.com/IdePHICS/FundamentalLearningEPFL/blob/main/TP1/FoIL_ex1.ipynb

  • I do not know any python! What should I do?

TP0 provides a short introduction. If you need more and really need to study python, here is a a good Python and NumPy Tutorial.

  • What is overleaf?

If you cannot compile LaTeX on your own computer (and even if you can, this is often a good strategy anyway), EPFL is providing Overleaf Professional accounts for all students: Overleaf EPFL . With Overleaf you can write and compile LaTeX directly from your web browser.




Week 1 (9/9)

Welcome to class! This will be the first week and the topic is a recap of basic Probability notions, and then a recap on statistics. We have to start by the very beginning!

This first class is a very brief recap on probability theory that will serve us well in this class. A good reference, and an absolutely recommended reading, for this lecture is Chap. 1-5 in All of statistics by Wasserman. This material should be standard for you at this point of your master studies. However,iIf you are not up to date with all these basic probability concept, you are invited to watch all videos and go through the lecture notes. It will help you a lot!

In this second class, we shall focus on the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3.

If you are not familiar with python, it will be important to prepare yourself before the class (and to come to the lab class with your computer!). Make sure you are familiar with the following materials: short intro to python and to visualization and making plots with Matplotlib.

The material for the first exercise session is in the TP1 folder of the Github repository. Please feel free to write on the forum if you have any questions!



In this second class, we shall wrap up  the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3. 



In this third class, we  shall move on an important topic, the cornerstone of modern machine learning: supervised learningA good read on supervised statistical learning is chapter 2 in An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani. They also discuss in detail K-neareast neighbors  (Knn)which we shall use a lot as well. 

The mathematical model I discussed in the second part on Knn is detailed a bit more in detail the lecture notes! You are encouraged to redo these computation that explain the infamous "curse of dimensionality". This post here is also very clear! 

  • 25/09 Lab class: 
In the lab class, we shall start to discuss supervised machine learning with    Classification using k-NN


  • 02/10 Lab class: We shall move on the amazing scikit learn package that will allow you to quickly solve many practical machine learning problems! You'll find that all of our favorite methods are already implemented in there. Moreover, we will discuss also other methods such as regression trees on a fresh new dataset.


  • 8/10 (v):  Lecture notes We shall now start our journey into parametric models, a journey that will eventually take us to neural networks! But let's proceed in order and start by the easiest and simplest one: linear models: Intro to linear models and Ordinary Least Squares problem: part1part2 Singular Value Decomposition (SVD): video


  • 9/10 Lab class: Notebook on Gradient descent methods to optimise functions and to solve least-squares problems.


Regression with Ridge and Lasso: part1, part2Classification problems: Part 1Part 2


  • Fall break



Richers features maps that linear ones, and Kernel methods, are one of the most important aspect of supervised machine learning. Michael Jordan’s notes on kernel are a good reference. The review from Hofmann, Scholkopf and Smola is also very complete. Scikit-learn has a detailed and very efficient implementation.

  • Lab class (30/10): Lab class: Classification on MNIST using Logistic regression and Support vector machines





Over the last decades, neural networks have made quite an impact, one might even say that they are at the origin of a revolution in machine learning and artificial intelligence. This simple website allows you to get intuition on how they actually work for simple dataset: Tensorflow playground. The universal approximation theorem is discussed in many references (see for instance here). Despite Backpropagation being a rather trivial application of the chain rule of derivatives from Newton and Liebnitz notes, it is the cornerstone of learning neural networks. 


A good summary of gradient decent algorithms is here. Convnets have made quite an impact, and have revolutionized computer vision, see the nice introduction by Yann Lecun.


There are many ressource on the topic online, and many books on this topic, which would deserve an entire course in its own. Nevertheless, it is good and useful to have a basic understanding of where we stand theoretically and to have grasp of the notion of VC dimension.

(We also started the next week early: here are the lecture note for session 9:)


  • Lab class: Deep Learning Tips and Tricks: link


Principal Component Analysis is (still) one of the most fundamental tool of machine learning. This post has great visual examples, that you can play with to get an intuition.

* Lab class : PCA here



This week we shall finish the unsupervised learning part (autoencoders and denoising), and then move on to 

Scikit learn has a good implemenation of k-means. Generative models are fundamental part of machine learning. The connection between Mixture of Gaussians and k-means clustering is well explained in David MacKay’s book page 300. The book is a very useful reference on this topic and probability in general (for instance Monte-Carlo methods adiscussed page 357). Boltzmann machines are discussed in in many places, for instances here and there. Generative Adversarial networks are very fashionable these days (check out This Person does not exists!). An introduction in pytorch is available here.

  • Lab class: Denoising Autoencoders link





We talk breifly about generativ emodels, here are the slides



  •  Lab Class:
    In the lab class, we will go back to supervised learning to learn few tricks we did not have time to see: Data Augmentation and Transfer Learning. link


Today we briefly discuss everything we have not done in the lecture: Times series (RNN LSTM and Tranformers)

RNN are still very useful (even though these daysit is all about  transformers). We used extensivly the following introduction. A simple RNN implementaion for learning to add number in keras is given here.

Reiniforcement learning is certainly one of the most interesting directions. You can find a simple implementation of q-learning herefor Frozen lake and of policy gradient for cartpole. The nature paper on alpha go is a fascinating read on the new era of reinforcement learning.

Lab class: We present the projects and clarify all your questions about them.