Machine learning II
MICRO-570
Media
========================
Content: This course will present some of the core advanced methods in the field for structure discovery, classification and non-linear regression.
Prerequisites: This course is intended as an advanced class in ML for MSc and PhD students and hence focuses on advanced topics of ML as well as on critical reading of the state of the art in ML. Students who are expected to have taken an introductory ML course, such as the MSc level Machine Learning I course given during the winter semester.
The course assumes prior knowledge in:
- Linear Algebra, Probability and Statistics.
- Standard machine learning and statistical analysis techniques such: PCA, K-means, SVM, linear and support vector regression, perceptron, feedforward neural networks
Students use their own PC and have the related software packages installed.
========================
The course takes place on Wednesday from 13:15 to 16:00 in the room INf 119.
========================
Format of the course:
The course format is that of a split-class.
- 45 minutes long video that presents the theory of the course must be watched prior to coming to class.
- 2 x 45 = 1h30 minutes of interactive lecture and interactive exercise session.
- 45 minutes for programming session, alternating with Q&A session on projects (look at schedule).
The interactive lecture will be given in class. If infrastructure allows, we will also give it on zoom for those who prefer to attend remotely at: https://epfl.zoom.us/j/62262147163
The interactive exercise session and programming session will be given on site.
========================
A repository of all lecture videos is available at: https://mediaspace.epfl.ch/channel/LASA+-+Machine+Learning+Courses/30562
========================
Grading: 40% of the Grade will be based on personal work done during the semester (this entails either doing a computer based mini-project (code competition) or presenting papers an advanced topic in the class debates, see Practicals section below). The remaining 60% of the grade will be based on a 30 minutes oral exam (15 minutes preparation, 15 minutes oral defense). The oral will examine the student on the material viewed during the course.
You must choose between participating in the coding competition or in a debate to earn the 40% of the grade based on personal work.
Debates: A set of topics for the panels/debates will be announced by the instructor. You can choose which camp you will defend. Debates last 30 minutes and take place in class. To prepare for the panel/debate, you must read associated literature and prepare a deck of slides and a 2 pages summary of the main arguments you plan to bring to the table. The panels/debates unfolds as follows: first each debater starts by presenting their position in 1-2 slides, then they debate with one another and with the audience for 15-20 minutes. During the live debate, debaters must back their arguments by using some of their additional slides that contain examples or more details facts. The audience votes on which side wins! Grade is based on the quality of the content of the arguments, slides and summary. Grade is not influenced by the outcome of the audience's vote!
Code Competitions : Several datasets are offered - this an unsupervised learning problem - the goal is to extract insightful information from the data, such as inferring patterns, identifying features that are instrumental and features that are irrelevant, extracting outliers, etc. To extract this information, you need to use one or more algorithms of your choice,. You should modulate the algorithm to obtain best performance, knowing algorithm's sensitivity to parameters' choices. You report on the results you have found in a 2-page summary, and a set of slides which you present in class. The most insightful analysis wins! Grade is based on the quality of the summary, slides and presentation. Grade is not influenced by the outcome of the competition!
Instructions and topics are available HERE.
You can sign up for debates and code competitions using the two polls, see polls at the end of this section.
Deadline to register for a project: March 5 2025
Deadline to submit your files:
- Debates and Coding Competition: 2-4 pages summary and slides must be submitted by 13h00 (1pm) on Monday, May 26.
Late submissions incur 1pt penalty per day late! Summary and slides must be uploaded on moodle, see link at the bottom of this section.
Schedule for Oral presentation of Competition and Debates will be posted in due time.
========================
Online Resources in Machine Learning:
- Online repository of papers from NIPS conference
- Pascal: Network of Excellence on Pattern Recognition, Statistical and Computational Learning (summer schools and workshops)
- ML List: "Archives of the Machine Learning List"
- ML Repository: "This is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms."
- Clever Methods for Overfitting
Recommended Textbooks:
General Introduction to Machine Learning:
- Machine Learning: A probabilistic perspective by Kevin P. Murphy, MIT Press
- Introduction to machine learning by A. Smola and S.V.N. Vishwanathan, on-line version
- "Bayesian Reasoning and Machine Learning" by D. Barber, Cambridge University Press
Kernel Methods: PCA, SVM:
- "Kernel Methods for Pattern Analysis" by John Shawe-Taylor, Nello Cristianini, Cambridge University Press (June 28, 2004)
- "Pattern Recognition and Machine Learning" by Christopher M. Bishop, Springer; 1 edition (October 1, 2007)
- Learning with Kernels by B. Scholkopf and A. Smola, MIT Press 2002
Statistical Learning Methods:
- "Pattern Classification" by Richard O. Duda, Peter E. Hart, David G. Stork, Wiley-Interscience; 2 Sub edition (October 2000)
- Introduction to Statistical Learning Theory by Olivier Bousquet, Stephane Boucheron, and Gabor Lugosi
- Information Theory, Inference and Learning Algorithms by David J.C Mackay, Cambridge University Press, 2003
Neural Networks:
- Spiking Neuron Models by W. Gerstner and W. M. Kistler, Cambridge University Press, Aug. 2002
- "Hebbian Learning and Negative Feedback Networks (Advanced Information and Knowledge Processing)" by C. Fyfe, Springer
- "Independent Component Analysis", A. Hyvarinen, J. Karhunen and E. Oja, Wiley Inter-Sciences, 2001
- "Self-Organizing Maps", Teuvo Kohonen, Springer Series in Information Sciences, 30, Springer, 2001
- "Introduction to Neural Networks: A Comprehensive Foundation" (2nd Edition) by S. Haykins
Reinforcement Learning:
- "Reinforcement Learning: An Introduction", R. Sutton & A. Barto, A Bradford Book. MIT Press, 1998
- "Reinforcement Learning: A Survey", Leslie Pack Kaelbling & Michael L. Littman and Andrew W. Moore, Journal of Artificial Intelligence Research, Volume 4, 1996
Course materials: Course materials are divided into several categories defined as follows
- L : Lecture slides and Solutions to exercises
- LX : Supplementary lectures (extra topics not evaluated)
- TP : Practicals-related materials (Description, Matlab code, and Solutions)
- MP : Mini-projects related information
- RD : Related Documentation (Textbook excerpt, etc.).
- AM : Additional materials (Matlab code, etc.)
========================
Lecture Notes:
The Lecture Notes of this course can be downloaded by clicking on the link below:
- ML Lecture Notes (File)
- Announcements (Forum)
- Poll for the coding projects (Choice)
- Poll for the debates (Choice)
19 February - Introduction - Spectral Methods [PART-1] - Kernels
Lecture
- Introduction to class format
- Brief overview of topics we will see in class
- Kernels - definitions, types,
- Geometric deformation of the space induced by kernels
- Introduction to coding projects and literature reading for debates
- Registration poll for the coding project: link
- Registration poll for the debates project: link
- Video `- Live lecture (valid only for 30 days after lecture) (URL)
- L1 | Slides Lecture 1 - practicalities & infos for the course (File)
- L1 | Slides Coding and Debate Competition Instructions (File)
- L1 | Slides Kernels (File)
- RD | Kernels (File)
- L1 | Solutions Exercises kernel (File)
- L1 | Matlab Code to Plot Isolines (Folder)
26 February - Spectral Methods [Part 2] - Kernel PCA
- Watch video on kernel PCA (25 minutes)
- Watch video on PCA to refresh your memory if you do not know it, see videon on principle of PCA and PCA derivation.
- Kernels continued
- Kernel-PCA (kPCA)
- Label (Text and media area)
- L2 | Slides kernel PCA (File)
- RD | kPCA (File)
- Slides | Interactive lecture and exercises (File)
- Exercises kPCA (File)
- Matlab script demonstrating Gram matrix centering (and eigenvectors calculation) (File)
- L2 | Matlab code to plot KPCA eigenfunctions (File)
- Solutions Exercises kPCA (File)
5 March- Spectral Methods [Part 3] - Linear & kernel CCA
Watch video on Applications of kernel CCA (Optional)
- kernel CCA
- Label (Text and media area)
- L3 | Slides linear and kernel CCA Lecture (File)
- Text and media area (Text and media area)
- Slides | kCCA - Applications (File)
- Slides | Interactive lecture kCCA (File)
- L3 - RD | KCCA (File)
12 March - Spectral Methods [Part 4] - kernel K-means
- Watch video on kernel K-means
- Watch video on how to handle missing data and unbalanced datasets
- kernel K-means
Q&A session on Coding Competition & Debates
- How to choose and analyse an algorithm
- How to search for literature.
- Label (Text and media area)
- L4| Slides kernel KMeans (File)
- RD | Kernel K-means (URL)
- L4 | Supplementary exercises kKmeans (File)
- Kernel K means interactive (File)
19 March- Spectral Methods - Practice Session 1 (kPCA, CCA and kMeans)
TA in charge of solution: Yongtao
This class is only a practice session. Bring your own laptop and exercise your understanding of kernal PCA, CCA and kernal K-means
26 March - Spectral Methods [Part 5] - Spectral Clustering & Nonlinear Embedding
- Spectral Clustering & non-linear embedding
- L3 - L4 | Ex ML kernels supplements (File)
- L3 - L4 | Ex ML kernels supplements Solution (File)
- L3 - L4 | Ex ML kernels supplements Solution Presentation (File)
- Label (Text and media area)
- L5 | Slides Spectral Clustering & Laplacian Eigenmaps (File)
- L5 | Slides Interactive Session - Spectral Clustering & Nonlinear Embeddings (File)
- RD | Spectral Methods for dimensionality reduction (File)
- RD | Discover clusters in feature space (File)
- L5| Exercises Spectral Clustering (File)
2 April- Spectral Methods - Practice Session 2 (Manifold Learning)
TA in class: Yongtao & Sthit
Objective:
Practice-Session: This is a 3-hour practice session on manifold learning methods. Come in class and do practice session 2 on matlab using your laptop.
Coding Project: Use also this time to ask TA for help on your coding project.
- TP2 | Manifold Learning & Spectral Clustering Description (File)
- TP2 | Manifold Learning & Spectral Clustering Code (File)
9 April - SVM for Clustering, Semi-Supervised Clustering and Classification
- (Optional:) Video recal of SVM and weaknesses
- SVM Limitations
- Sparse SVM - nu-SVM and Relevance Vector Machine (RVM)
- SVM Semi-Supervised Clustering - Transductive SVM
- SVM clustering - SVC
- SVM Clustering, Semi-Supervised Clustering, Sparse SVM + Polynomial kernel for SVM
- 1st Milestone: One or two algorithms already implemented on dataset
- Show TA early result you have obtained
- Text and media area (Text and media area)
- L6| Slides SVM Recap (Optional Video) (File)
- Label (Text and media area)
- Label (Text and media area)
- L6 | Sparse SVM - nu-SVM (File)
- Label (Text and media area)
- L6 | Slides Transductive SVM (File)
- Label (Text and media area)
- L6 | Slides Support Vector Clustering - SVC (File)
- L6 | Slides Interactive Lecture (File)
- Video | SVM extension interactive lecture + exercises (Valid only 30 days) (URL)
- L6 | Exercises SVM and extensions (File)
- RD | SVM Derivation (File)
- RD | SV Clustering (File)
- RD | Transductive SVM - convex approach to its optimization (File)
- RD | Sparse Bayesian Learning and the RVM (File)
- RD | RVM (File)
16 April - Practice Session 3 - Sparse SVM versus other classification methods Methods
TA: Baiyu is present and presents the solution
Objectives:
Practice session: This is a practice session on sparse SVM and other classification methods seen in class
Coding project: Take the time to ask TA for help with your coding project
- TP3 | Classification (SVM, RVM & AdaBoost) Description (File)
- TP3 | Classification (SVM, RVM & AdaBoost) Code (File)
23 April - EASTER BREAK
Support Vector Machine (SVM) and extensions
Wednesday April 1 Lecture
- Lecture
Friday April 3:
- Exercise session
30 April - From Linear to Nonlinear Regression - Gaussian Process Regression AND Q&A Debate and Coding Competition for GROUP 1
- 1:15-2pm Interactive lecture on nonlinear regression3h15-4pm: Q&A Debate and Coding Competition for GROUP 1 ONLY
- Label (Text and media area)
- L7| Slides Ridge Regression (File)
- AM | Ridge Regression - Code (File)
- Label (Text and media area)
- L7 | Slides Sparse Support Vector Regression and kernel Variants (File)
- RD | Feature Selection in Probabilistic SVM (File)
- Slides | Interactive Lecture on SVR and Ridge Regression (File)
- L7+L8 | Exercises Non Linear Regression (File)
- L7+L8| Solutions Exercises Non Linear Regression (File)
- CX | Feature Selection code (ML_toolbox required) (File)
7 May- From Probabilistic PCA to Gaussian Process Latent Variable Models (GPLVM) and Variational Inference AND Q&A for Debate and Coding Competition for GROUP 1
- Interactive lecture on Gaussian Process Regression, and extensions: GPLVM and variational inference
Support to Debate and Coding Competition for GROUP 1 ONLY
- Label (Text and media area)
- L8 | Slides GaussianProcesses Regression (File)
- L8 | Slides on GPs (File)
- L7+L8 | Exercises Non Linear Regression (File)
- L8 | Slides Interactive Session GPR (File)
- RD | GPR Model Selection (File)
- RD | GPR (File)
- RD | Gaussian Process Classification File (File)
- RD | Gaussian Process Latent Variable Models File (File)
14 May - Oral presentation - Debates and Coding Competition - Group 1
This class is devoted to students presenting the results of their coding projector or debate discussion. The schedule is described below.
Beware that the 4-pages summary for debate and coding competitions are due on May 26, 13h00 (1pm)
Schedule for presentations:
13h15-13h30: Debate Success of ML - Sofiane Walid against TAs/Teacher + classroom
13h30-13h40: Coding competition Average Monthly Surface Temperature Benjamin Beretz
13h40_13h50: Coding competition Average Monthly Surface Temperature - Xinyi Han
13:50-14:00: Coding competition Average Monthly Surface Temperature Q&A
14:00-14:10: Coding competition - Netflix - Gregoire Gimenez
14:10-14:20: Coding competition - Netflix - Irvin Dalaud
14:20-14:30: Coding competition Netflix Q&A
14:30-14:40: Coding competition - Traffic Accident - Valentin Perret
14:40-14:50: Coding competition - Traffic Accident - Damien Vincent
14:50-15:00: Coding competition traffic Accident Q&A
15:00-15:10: Coding competition - Food Nutrition - Quentin Rossier
15:10-15:20: Coding competition - Food Nutrition - Osman Ornek
15:20-15:30: Coding competition - Food Nutrition - Q&A
15:30 - 16h00: Grade Deliberation - TA + Teacher (no students)21 May - Q&A Coding Competition - Group 2
28 May - Oral presentation - Debates and Coding Competition - Group 2
This class is devoted to students presenting the results of their coding projector. The schedule is described below.
Students present in this order:
13h15-13h25: Coding competition - Calories Burned - Maksymiliann Wojciech Schoeffel
13h25-13h35: Coding competition - Calories Burned - Julien Ferdinand Gouraud
13h35-13h45: Coding competition - Calories Burned - Q&A
13:45-13:55: Coding competition - AI/ML Salaries - Théo Pierre Luc Basseras
13:55-14:05: Coding competition - AI/ML Salaries - Thomas Jerver Asmussen
14:05-14:15: Coding competition - AI/ML Salaries - Q&A
14:15-14:25: Coding competition - Education & Career Success - Javier De Ramón Murillo
14:25-14:35: Coding competition - Education & Career Success - Rayan Bouchallouf
14:35-14:45: Coding competition - Education & Career Success - Q&A
14h45-14h55: Coding competition - French employment, salaries, population - Mathieu Stawarz
14h55-15h05: Coding competition - French employment, salaries, population - Marko Mitric
15h05-15h15: Coding competition - French employment, salaries, population - Q&A15:15 - 15h45: Grade Deliberation - TA + Teacher (no students)
14 May - 1st Debate and Coding Competition Reinforcement Learning (RL Part-1)
- 13:15-2pm Interactive lecture on RL
- 2h15-4pm Practice session (on computer) on regression
10 May - Continuous & Inverse RL (RL Part - 2)
- 13:15-2pm Interactive lecture on inverse RL
- 2h15-4pm Exercises on continuous RL and IRL
17 May - HMM
- 1h15-2pm Interactive lecture on HMM (given by the professor on zoom!)
- 2h15-4pm Exercise session on RL and HMM (on site)
24 May - Oral paper presentations + mini-project deadline (May 23)
Deadline for handing out report on mini-project and slides for oral presentation of papers: May 23, 12:00 (noon) . These must be submitted through moodle, see link below.
31 May - Overview class, Q&A session
Boosting-Bagging
Oral Exam - 19 June 2025
The oral exam will take place on June 19. Each exam slot lasts 30 minutes. It entails 10 minutes preparation, 15 minutes preparation and 5 minutes for the transition across students. The exam is closed-book, but you are allowed to bring a A4 recto-verso handwritten notes. Notes can be written on paper or on a tablet.
You can register for a slot for the exam at: https://doodle.com/meeting/participate/id/bYqYLEnb
Registration is on a first come, first serve basis.