Introduction to ENG-270

Satoshi Takahama

EPFL

2024-09-11

General points

Diversity statement

Team

Instructor

  • Satoshi Takahama

Assistants

  • Julie Varupenne
  • Quentin Poindextre
  • Paul Girard

Your instructor

Research areas

  • atmospheric chemistry
  • aerosol physics
  • environmental sensing
  • numerical simulations for air quality modeling
  • molecular simulations (high performance computing)

Teaching activities (since 2012 at EPFL)

  • air pollution and climate change
  • atmospheric chemistry
  • computational methods and tools
  • Semaine ENAC (life cycle analysis, pollution near roadways, computing)
  • Design Projects, semester projects, Masters theses

Tools

  • Programming languages: Python, MATLAB, R, Bash, C, Fortran, Lisp, Julia, Igor Pro
  • Editor: GNU Emacs, VSCode
  • Operating systems: GNU/Linux, macOS, Windows (with WSL - Windows Subsystem for Linux)

Course objectives

Objectives

  • prepare you for future courses in SIE
  • provide broad base of knowledge to build upon in future internships and Masters projects
  • prepare you to solve problems in engineering computationally
    • identify the type of model and data that you need
    • break down the problem into parts
    • implement solution that is traceable and reproducible
    • communicate the meaning of your results effectively to nontechnical clients

Learning outcomes

  • Describe differences among programming paradigms and data models.
  • Model a physical or chemical process.
  • Develop programs to solve quantitative problems.
  • Integrate simpler modules into a larger program.
  • Interpret program output.
  • Choose appropriate computational methods and tools to solve a problem.
  • Defend selection and implementation of computational methods and tools.

Courses with computational elements

Course format (1)

  • 1st 1/2 of course (approx.): exercise + midterm
  • 2nd 1/2 of course (approx.): independent projects

Nominally:

Day Hours (room) Activity
Wednesday 13h-14h (GCC330) Lecture
14h-15h (GRB001 + GRC002) Exercise / Project (independent)
15h-16h (GRB001 + GRC002) Exercise / Project (with assistants):
Friday 10h-11h (GCA331) Lecture
11h-13h (GRB001 + GRC002) Exercise / Project (with assistants)

We will reduce the lectures and increase the number of independent hours toward the end of the project period.

Exercise

  • solving problems with code: how to structure the solution to something that can be computed
  • identify what programming concept you need to know to code the solution so you can go learn it
  • some tutorial resources provided or suggested, but otherwise you can find one that connects to your background - the aim is to become sufficient in learning on your own

Project

  • introduction to new tools
  • write project proposal and advance progress independently, with feedback from assistants

Course format (2)

Note difference from previous course.

CS-119:

  • introduced to programming elements and complete exercises applying building blocks
  • self-contained problems

ENG-270:

  • presented with problem(s), identify which programming elements to use and how to build a solution from them
  • interact with “real world” (input/output)
  • learn what you need to know to solve the problem

Expectations are set by the course, students should become capable of formulating questions and searching for resources to learn concepts largely on their own.

How to get help (1)

If assistants cannot answer your question in 5 minutes, we recommend that you post your question to Ed Discussion.

Stack Overflow provides a good guide on how to ask questions, particularly related to programming. You are also welcome to post general questions on Stack Overflow, but for course-specific questions, you can use our Ed Discussion Forum.

Main points to note before posting a question asking for help:

  1. Try to debug using print statements and debugger.
  2. Search for error first on Google and be prepared to explain how yours might be different from previous instances you’ve found.

How to get help (2)

When you are ready to post your question, follow this checklist:

  1. Write helpful title.
  2. Introduce what you are trying to do.
  3. Describe the error and what you have tried already. Sometimes writing why solutions to related problems that are easily found on the internet do not apply to your problem.
  4. Provide code+data that reproduces this error.

Providing minimal working example that reproduces this error:

  • Keep in mind the main principle - teachers have to be able to step through the program on their machine to help you debug.
  • We are less likely to help you if you screenshot your code as we cannot copy and paste the code to debug it.

Do not copy/paste your entire program into the forum - include just enough code + data to allow others to reproduce the problem.

  • Create an example problem with the minimum number of operations that gives you the same error. This will reduce the number of lines of code that we have to debug - and will help you already narrow down where the bug lies.
  • If your problem also involves data, reduce your data (e.g., number of entries or lines in your data file) so that we are really working with a minimal data set that still gives the same problem. Sometimes the problem lies in the data and this step can also help you identify whether this is the case (and which lines of data are problematic).

It will take time to ask your question properly, but this is an important pedagogical exercise that we emphasize in the course. It is often the case that you will diagnose your own problem while following the steps above to prepare the minimal example problem.

How to get help (3)

Main points:

  • Be specific in your question in what you are trying to do and what is stumping you (beyond “it doesn’t work” / it gives me an error”).
  • Describe your current understanding and what you’ve tried so we can give feedback on your approach (and also to show that you’ve put in the effort to try to solve it yourself).

Assessment

Mid-term exam (50%)

  • based on lectures, lessons, and exercises (exercises are not for turning in)
  • on pencil and paper only - no electronic devices
    • structuring proposed solution
    • interpreting behavior of code examples

Projects (50%)

  • (10%) project proposal - due 30.10.2024
  • (55%) code + (35%) final report - due 20.12.2024
  • maximum two students per group
  • modeled after SIE Design Project (MA2)
  • more detailed introduction on 18.09.2024

Timeline

Course content

Overview

Computational methods

A nonexhaustive, nonexclusive classification:

  • forward modeling - making predictions or forcasts
  • inverse modeling (inference, estimation) - understanding underlying processes
  • data analysis - extracting meaning from data

Computational tools

  • editor
  • terminal
  • debugger
  • build system
  • version control

Contents

In exercises

  • Common data exchange formats, I/O (input/output)
  • Data models and programming paradigms
  • Data analysis
  • Control structures (loops, conditional statements)
  • Debugging
  • Memory allocation and management
  • Interpreted and compiled languages
  • Programming syntax and debugging

In projects

  • Version control systems
  • Automation
  • Shell scripting and text processing
  • Writing clear code and documentation
  • Properly attributing credit
  • Numerical methods and scientific computing
  • Visualization

Main resources - where to find course information

Launching point, slides, links

Discussions and announcements

Exercises. https://sieprog.ch is created (and hosted) by Dr. Thomas Lochmatter.

Additional material (by me) can be found on my GitLab pages .

Use of AI tools

Generative AI (Large Language Models)

Popular LLMs

  • Microsoft Copilot - use through Edge browser.
    EPFL subscription (OpenAI / Microsoft)
  • ChatGPT (OpenAI)
  • Claude (Anthropic)
  • Gemini (Google)

Can also integrate into editor (VSCode)

  • GitHub Copilot
  • Blackbox
  • ChatGPT

Anticipated performance

  • Code may contain bugs (i.e., doesn’t work)
  • More likely to fail when complex
  • Solution can be convoluted (difficult to read or modify)
  • Not deterministic
  • Useful when you already know or can verify the output

How you can use LLMs in this course

During exercises

  • motivate your solution when stuck
  • as personal tutor - ask it to explain things for you

During project

  • can use to generate code
  • describe how you generated the code in your “Methods section” for the report
  • use it to generate first draft of report

You will not be able to rely on AI tools during the midterm exam.

You are ultimately responsible for the accuracy and quality of the submitted code + report.

Prompt engineering

General recommendations

  • Provide context
  • Be specific
  • Refine your request

AnthropicAI provides a guide on prompt engineering [1] [2] for their Claude AI model.

Future of LLMs

Optimistic scenario

  • AI advances to solve engineering problems and eliminates the need for coding

Pessimistic scenario

  • Increasing proportion of AI training data comes from generated data, leading to “Model collapse”

Is ENG-270 still relevant?

  • Need a strong foundation in programming to take advantage of current LLMs
  • Learn a different way to think - how to solve problems computationally

Getting started

Open source, free software, etc.

  • Source code availability: open / closed
  • Distribution model: free / restrictive
Free Restrictive
Open free “source available”
Closed proprietary


  • Open source - source code can be downloaded and inspected
  • Free software - source code can be modified and deployed
  • Licenses can be further classified as permissive / protective - some licenses impose licensing requirements on derivative works; others do not define such restrictions


  • CPython license (Python Software Foundation License is open, free, and permissive)
  • MATLAB is proprietary, though some (but not all) source code is available
  • C is a language standard; the GCC implementation is distributed under the GNU General Public License (open, free, restrictive). LLVM-Clang implementation of C is distributed under the Apache License (open, free, and permissive)

Installation

Definitions:

  • operating system - manages appliations on your computer examples: GNU/Linux, Windows, macOS
  • executable (or binary) file - when invoked, file that launches application that carries out instructions on your machine examples: python, matlab, code
  • system path - location of executables that the OS or program will search through to find the executable file that was invoked examples: “~/usr/bin”, “%USERPROFILE%/Apps”

Example - installing VSCode places a set of files (including an executable file) in a directory, and this directory is often added to the system path so that other programs can find it. (Also instructions for uninstall may also be copied to your machine.)

OS Executable location Extensions
Linux /usr/bin ~/.vscode/extensions
Windows %USERPROFILE%\AppData\Local\Programs\Microsoft VS Code\bin %USERPROFILE%\.vscode\extensions
macOS /Applications/Visual Studio Code.app/Contents/Resources/app/bin ~/.vscode/extensions
  • %USERPROFILE% in Windows is nominally C:\Users\{Username}.
  • ~/ is the user directory; nominally in Linux it is /home/{Username} and /Users/{Username} on macOS.

ENAC Virtual Desktop Infrastructure (VDI)

Remotely log onto ENAC machines that have software installed.

  • GNU/Linux: ENAC-SSIE-Ubuntu-20-04 [recommended]
  • Windows: ENAC-SSIE-WIN

Languages

Non-exhaustive classifications

  • high-level / low-level
  • imperative / declarative
  • procedural / functional / objecto-oriented / logic
  • typed / untyped
  • (in implementation): interpreted / compiled

Selection criteria:

  • Speed and memory requirements, “expressiveness”, libraries

Why

  • Python? high-level / multiparadigm / untyped / interpreted; general purpose, one the the most popular languages (“2nd best language for everything”)
  • MATLAB? high-level / multiparadigm / untyped / interpreted; concise coding of mathematical (linear algebra) algorithms (vectorized operations)
  • C? low-level / imperative / procedural / typed / compiled; high performance or memory-limited applications

Implementations

Some languages (C, Fortran, Lisp) are defined by published standards. A language standard (or specification) defines what features of the language should be supported by a standard-compliant implementation.

  • Multiple developers/vendors can build implementations that follow these standards.
  • C follows ANSI/ISO standards and is implemented by GCC, Intel, Clang, etc.

Some languages are defined by their implementation.

  • E.g., CPython is the reference implementation that is duplicated by Jython, Numba, IronPython, PyPy, etc.
  • MATLAB has only one implementation, though there exist “clones” that are largely compatible (GNU Octave, SciLab) or languages that are inspired by it (Julia).

In principle, the distinction between compiled and interpreted languages is in the implementation. (However, in most common implementations, Python remains an interpreted language and C remains a compiled language.) An interpreter translates human code into machine code instruction by instruction, whereas a compiler translates the whole program into machine code before the code is run.

Editor / IDE

There are 100s of editors and the selection is a personal choice. They are tools to help you write and repair code.

A proper editor provides

  • syntax highlighting
  • autocompletion
  • bracket matching
  • linting and corrections
  • code navigation
  • debugging integration
  • refactoring

DO NOT use word processors that deal with RTF, DOC, DOCX, etc. to deal with code - they can insert invisible characters that will give you headaches later. For a quick view, open in simple editors like Notepad, TextEdit, or gedit instead.

VSCode

Recommend VSCode for Python and C for the course - unless you prefer to use something else.

  • used in CS-119
  • one of the most popular and rapidly developed editor currently

VSCode provides a framework for adding plugins or extensions that make it aware of each language (Python, C/C++). These extensions do not install Python or C so you must install them separately.

Linter - warns you of errors in your code or bad style. SOMETIMES YOUR CODE DOES NOT CONTAIN AN ERROR BUT IS MARKED BECAUSE THE LINTER HAS NOT UPDATED OR THE STYLE IS NOT CONSISTENT.

Example of linter (providing correct syntax error warning)

https://code.visualstudio.com