Foundations of artificial intelligence

ME-390

Media

This file is part of the content downloaded from Foundations of artificial intelligence.

Course information

Course format

In-person lectures, in-person exercise hours.

Assessment

3 in-class quizzes (up to 30%), one end of semester written final exam (70%). The quiz grades are counted if they help your final grade. So, your final grade is calculated as follows:

final grade = max(70% final + 10% q1 + 10% q2 + 10% q3, 80% final + 10% q1 + 10% q2, ..., 100% final)

Above, q1,q2,q3 refer to quizzes 1, 2, and 3, respectively. It follows that your final grade is the maximum among 8 possible combination of final exam and quiz grades.

Quizzes are on 16.10, 20.11, 11.12, during the exercise hour. They are 20 minutes and no aid is allowed (no books/notes, no electronics).

The final exam is closed-book. You are allowed one cheatsheet, where you can use a double-sided page to write any material from the course.

Teaching assistants

Anna Maddux (anna.maddux@epfl.ch)
Tingting Ni (tingting.ni@epfl.ch)
Andreas Schlaginhaufen (andreas.schlaginhaufen@epfl.ch)
Kai Ren (kai.ren@epfl.ch)
Giulio Salizzoni (giulio.salizzoni@epfl.ch)
Gabriel Vallat (gabriel.vallat@epfl.ch)
Saurabh Dilip Vaishampayan (saurabh.vaishampayan@epfl.ch)

Office hour: You may contact the TA team directly in case you have questions. For questions that could be of relevance to other classmates, please use EdDiscussion, or pose them during lecture or exercise hours.

Recommended references:

There are too many resources online on artificial intelligence and machine learning. While these sources might provide a good intuition, not all have the same depth and rigour. I recommend the following.

1. Book: on machine learning with engineering applications: Machine Learning for Engineers, Using Data to Solve Problems for Physical Systems by Ryan G. McClarren. We refer to it as ML4Engineers in the course

2. Book: with relevant linear algebra background for the course, referred here as LinAlgebra book: Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares

3. Online book: with a good overview of probability

4. Book: with a deep mathematical treatment of the subject: Understanding Machine Learning

5. Book: neural networks: Deep Learning

6. Online course: The machine learning course at Stanford, referred here as MLStanford: EE 104, Stanford

7. Online course: The machine learning and data mining course at UBC: CPSC 340, UBC

8. Online videos: StatQuest!!!: This website has several very easy to follow youtube videos on most of the concepts we cover in the course.

Note: While some books, lectures and videos are on the same topic we have discussed, the notations and terminology can be different. The difference in terminology is sometimes due to the fact that same concepts are discussed in statistics, mathematics, engineering and machine learning. Each of these communities might have chosen their set of terms and notations.

Acknowledgement

The notes and python exercises are mainly based on The EPFL Course, CIVIL-226 created by the Vita lab.

We introduced the course and the administrative matters of the course. We introduced artificial intelligence (AI) and the machine learning (ML) approach to AI. We defined supervised learning and unsupervised learning, and introduced linear regression as a supervised learning approach.

Optional: For a thorough introduction to learning you can read Sections 1.1-1.3 of the UnderstandingML book.

We had no lecture due to the holiday. In the exercise hour you were required to go through the exercises in the file "Background and notations", posted last week and review some python coding tips in 02-numpy folder of the python exercises.

For looking up some python commands and comparison of the commands with matlab, you may use the following online cheat sheet.

Solution to Python Exercise (URL)

We formulated the linear regression problem, defined the mean square-error empirical loss function and derived the optimal linear regression parameters minimizing this loss function. We discussed nonlinear feature mapping as well as over-fitting and under-fitting.

Additional resources: Section 2.1, 2.2 of the ML4Engineers book and Appendix C.1, C.2 from LinAlgebra book. For a brief review of linear algebra concepts you need in this course, see Chapter 2 of Deep Learning book, while for a brief overview of gradient based optimization you may see Section 4.3 of Deep Learning book.

Note: on slide 16, there should be a factor of 2 in front of the term w^TX^Ty in the formula for J(w).

We discussed training and test error, overfitting and underfitting, and regularization for potentially reducing the model sensitivity and hence, reducing overfitting. We discussed logistic regression for classification, and discussed the logistic loss function. We provided interpretation in terms of the cross-entropy. Lastly, we discussed the performance metrics in terms of the confusion matrix.

Additional resources: Sections 2.4 up to 2.4.2 of the ML4Engineers book and pages 45,46 (norms) and 48, 49, 50 of the LinAlgebra book.

We continued our discussion on performance metrics for logistic regression. We also introduced multinomial logistic regression, the associated loss function and confusion matrix. Furthermore, we discussed data processing before doing the machine learning approach including approaches to understand the features of data using summary statistics, as well as normalization, scaling, and nonlinear feature engineering.

Additional resources: Sections 2.3.4 on multinomial logistic regression of the ML4Engineers book.

After reviewing concepts of conditional probability distribution and Baye's rule, we defined the Naive Bayes classifier as an approach to classification. We discussed first the approach in the case in which features are finite-valued features, and then discussed Gaussian Naive Bayes classifier for the case in which the features are continuous-valued.

As a background on probability, you may review these notes.

We reviewed concepts from data statistics and probability distribution needed for this course: probability distribution, empirical distribution, independence, conditional distribution, conditional independence. We discussed the Naive Bayes classifier and saw examples of it for spam email detection.

Note: For an example of conditional independence, you can see Wikipedia article, the example with colored boxes. For a review of probability and some of the concepts in probability we covered you may see Chapter 3, Sections 3.1-3.9 of the Deep Learning book.

In the first hour, we discussed students' feedback, our measures to account for them, as well as the clarifications of course objectives. In the second hour, we presented the kNN approach for classification and regression.
In the exercise hours, you will apply kNN to two datasets and can compare its performance in each case.

We discussed a neural network as a nonlinear predictor with a specific structure, for classification and regression. We discussed training neural networks using variants of gradient descent.

Additional resources: Sections 5, 5.1, 5.2, 5.3 of the ML4Engineers book,

We discussed training neural networks using gradient descent and its variants.

Next, we motivated convolutional neural networks and defined the operation of convolution of a signal with a filter and saw examples of convolution. We then described convolutional neural networks.

Additional resources: Chapter 6, 6.1, 6.2, 6.3 of ML4Engineers book. For transfer learning, see the case study in 6.5 of ML4Engineers book

PCA: We described two unsupervised learning techniques: dimensionality reduction and clustering. For dimensionality reduction, we discussed an approach based on principal component analysis (PCA). This approach defined a reduced number of features based on linear combination of the original features.

Additional resources: For PCA, see the PCA lecture from the MLStanford course and the StatQuest posts for review of data standardization and covariance.

In this lecture, we discussed clustering and specifically talked about the k-means approach to clustering. This approach determined k clusters to group the data and used the mean of the data points in each cluster as a representative of the points in the cluster. Next, we considered decision trees for regression and classification. We observed that decision trees can be interpretable (if the depth is not too large). Finding an optimal decision tree is a challenging optimization problem. Hence, we considered greedy algorithms that add nodes sequentially based on best performing feature and threshold values at each tree depth.

For additional resources on clustering read Chapter 4 of LinAlgebra book.

We continued discussing how to create classification trees using gini impurity index. For further examples on decision trees, you may see StatQuest link on Classification Trees.

We next had a discussion, moderated by Prof. Sascha Nick, on conditions for AI to benefit societies.

Note: I have added resources on AI ethics that you can use to prepare yourself when reflecting about this topic in your work.

We reviewed AI ethics lecture, and moved on to discrete-time dynamical system and control, as a first step towards our reinforcement learning lecture.

Furthermore, we had a guest lecturer, Dr. Roberto Castello, from Swiss Data Science Center on AI in Industry.

16 December - 22 December

Practice final exams

Last year exam and another sample final, both with solutions, are posted. Recommendation: first try to solve the problems without looking at the solutions. Then, check your work in detail by looking at the solutions.