Visual intelligence : machines and minds
CS-503
Media
Welcome to Visual Intelligence: Machines and Minds!
Attendance: The class location is CM15 (same as CM5) on Tuesdays (17:00 to 19:00) and CE11 on Thursdays (16:00 to 18:00). Attendance in person is recommended. Please coordinate with the TAs beforehand if you have to attend remotely. The Zoom link will only be provided upon approval of genuine requests; please do not be vague in your requests.
Teaching Team:
Instructor: Amir Zamir (Prof.) (amir.zamir@epfl.ch).
Amir's office hours: Tuesdays 16:00-17:00. Location: INJ 213. Sending an email before coming to office hours is recommended.
Doctoral TAs: Rishubh Singh (head TA), Zhitong Gao, Roman Bachmann. (TA mailing list: vimm-ta@groupes.epfl.ch)
Course Structure: All scheduled classes (both Tuesdays and Thursdays) in the first half of the course will be devoted to lectures. The second half will be for project work, and all classes will be exercise sessions dedicated to Q&A and mentoring. Please see the course calendar below for full details.
-----------------------------------------------------------------------------------------------
Course Summary
The course will discuss classic material as well as recent advances in computer vision and machine learning relevant to processing visual data -- with a primary focus on embodied intelligence and vision for active agents.
Course Content
Visual perception is the capability of inferring the properties of the external world merely from the light reflected off the objects therein. This is done beautifully well by simple (e.g., mosquitoes) or complex (e.g., humans) biological organisms. They can see and understand the complex environment around them and act accordingly -- all done in an efficient and astonishingly robust way. Computer vision is the discipline of replicating this capability for machines. The progress in computer vision has brought about successful applications, such as face detection/recognition or handwriting recognition. However, a large gap to sophisticated perceptual capabilities, such as those exhibited by animals, remains.
The goal of this course is to discuss what is possible in computer vision today and what is not. We will overview the basic concepts in computer vision and recent advances in machine learning relevant to processing visual data and active perception. For inspiration around the missing capabilities and how to approach them, we will turn to visual perception in biological organisms.The course includes lectures and projects. There will be a heavy emphasis on the projects and hands-on experience. The course project will be around designing, implementing, and testing a solution to a (preferably open) problem pertinent to visual perception. The students are encouraged to work in groups, self-propose a project that excites them, and go for ambitious yet feasible projects. The course staff will provide support throughout the semester with the projects. In the lectures, the students will learn about the principles of computer vision, the current limits, and the visual perception in humans and animals, which will help them with formulating their course projects. In particular, the lectures will discuss the following:
- A recap of basic computer vision concepts: classification, detection, grouping, image transformations, optical flow, 3D from X, etc., and recent successful neural network architectures, such as Transformers.
- Psychology/physiology of the visual system.
- Perception-action loop: active perception and embodied vision.
- Multimodal perception
The course is of relevance to masters/PhD students interested in research in computer vision, machine learning, and perceptual robotics, as well as senior undergraduate students interested in understanding state-of-the-art computer vision.
Required Prerequisites
- Machine Learning (CS-433) or Introduction to Machine Learning (CS-233) or equivalent course on the basics of machine learning.
- Deep Learning (EE-559) or Artificial Neural Networks (CS-456) or equivalent course on the basics of deep learning.
Recommended courses
- Computer vision (CS-442) or equivalent undergraduate/masters course on the basics of computer vision.
Important prerequisite concepts to start the course
- Deep learning and machine learning.
- Python programming.
- Basics of probability and statistics.
Expected student activities
- In regard to the lectured material, the students are expected to study the provided reading material, actively participate in the class, engage in the discussions, and answer homework questions.
- In regard to the course project, the students are expected to formulate and implement an in-depth project, demonstrate continuous progress throughout the semester, and provide a final written report and presentation.
Assessment methods
- Project (60%) [distributed over the project proposal, milestone reports, final report and presentation]
- Homeworks (40%)
Late Days
Late days are for all deliverables including homeworks and project reports. They are counted by hours and there are 72 hours in total. You can’t pool your late days for project deadlines, e.g. you cannot submit your proposal more than 72 hours late. Similarly, the late days used for project deadlines are added to each student’s late day accordingly, e.g. submitting project progress report 10 hours late will deduct 10 hours from each student’s late days.
Swapping final project presentation time
Bibliography
- Vision Science: Photons to Phenomenology, Steven Palmer, 1999.
- An Immense world, Ed Yong, 2022.
- The Ecological Approach to Visual Perception, Jame Gibson, 1979.
- Computer Vision: Algorithms and Applications, Richard Szeliski, 2020.
- Animal Eyes, Michael Land and Dan-Eric Nilsson, 2012.
The reference reading of different lectures will be from different books (the main ones listed above) and occasionally from papers. Resources will be provided in class. Full-text books are not mandatory.
Project Guidelines:
See this PDF.
Resources for the course project:
Please see the instructions for compute resources in this PDF and details on how to use SCITAS here. Also see this page for video guides on prototyping (e.g. 3D printing, soldering, etc.).
Grade rebuttal mechanism:
Course calendar:
Note that we will update the full course calendar in the coming week. Meanwhile, you can find the schedule for the first 5 lectures here.

- Slides Lecture 2 (File)
- Lecture 2 - History and Computer Vision Recap (URL)
- Slides Lecture 3 (File)
- Lecture 3 - Transformers (URL)
- Slides Lecture 10 (File)
- Slides Lecture 11 (File)
- Lecture 10 - Introduction to RL (URL)
- Lecture 11 - RL Applications (URL)