Required Matlab knowledge for the course Hydrology for Engineers

Last update: 2022-09-07

Foreword

Knowledge of basic programming is today of fundamental importance for any individual, in particular for scientists and engineers. EPFL is strenghtening its "computational thinking" character and this course is adapting by proposing basic —but rigorous— exercises that require machine computations.
This course is traditionally taught in Matlab, because it is a language easy to use and students can have free student licenses. Matlab is not open-source as R or Python, but the Octave project is open source and it has an almost full compatibility with Matlab syntax.
This short guide lists the basics of Matlab programming that are required for the course exercises. We expect students to already know most of the content or to learn it quickly (before week 3). During exercise sessions we will not spend time on learning/revising these Matlab basics.
This document is generated using a notebook (Matlab live script) that mixes simple text (prose) and code. You will find frequent links to the Matlab guide, as Matlab generally has an excellent documentation. An additional online Matlab/Octave course in french is provided by the EPFL ENAC-IT team here: https://enacit.epfl.ch/cours/matlab-octave/
Table of Contents

Introduction

Before we start with coding, remember two things:
1) never work inside a compressed (zipped) folder,
2) always check that your Current Folder in Matlab is the same folder where the script is saved.
Matlab provides excellent documentation and we will often refer to it. To get started, you can go through the basic tutorials such as Desktop Basics, Array Indexing, etc.
You can open the documentation from your Command Window by typing these commands:
% type doc or doc <<topic>> to open the documentation
doc
doc MATLAB %to open the documentation on 'MATLAB'

Elementary operations

Execute code chunks

One practical utility is to run just pieces (chunks) of code. You can do this by highlighting a piece of your script and then either: right-click and select "Evaluate Selection in Command Window", or press a shortcut key (F9 in Windows, function key + F9 on Mac).

Create and use vectors and matrices

Check the tutorial Matrices and Arrays or open directly the documentation into Matlab as a live script with:
openExample('matlab/MatricesAndArraysGSExample') % to open examples directly into Matlab
For the course, you need to be able to:

Indexing

Indexing is extremely important because it is what let us access the data that is stored in vectors and matrices. You can check this example:
openExample('matlab/MatrixIndexingExample')
Later when talking about tables we will see how to access a variable by its name. But here let's focus on 2 main ways to access the data: position indexing and logical indexing

Position indexing

This is simply accessing an element by its position. Remember that Matlab, differently from other languages, always starts from position 1 and gives error if you try to access position 0. Some examples:
% create a vector
a = [5, 3, 14.5, 13.4]
a = 1×4
5.0000 3.0000 14.5000 13.4000
% now access the elements at position 2 or at position 2 and 3
a(2)
ans = 3
a([2,3])
ans = 1×2
3.0000 14.5000
% you can store the positions into a vector, which can access the same
% elements multiple times and can be as long as you wish
b = [2,2,2,4,1,1,4,4,3,3,3,3];
a(b)
ans = 1×12
3.0000 3.0000 3.0000 13.4000 5.0000 5.0000 13.4000 13.4000 14.5000 14.5000 14.5000 14.5000
% to access the last element of a vector, the 2 statements below are
% equivalent
c = a(length(a))
c = 13.4000
d = a(end)
d = 13.4000
Remember the indexes always need to be integer values and be included between position 1 and the length of the vector, otherwise you will generate an error.

Logical indexing

Logical indexing is based on logical tests
% create a vector
a = [5, 3, 14.5, 13.4]
a = 1×4
5.0000 3.0000 14.5000 13.4000
% to access the elements, you need to give a true or false to each element
% in the vector
b = [true, true, false, false];
a(b)
ans = 1×2
5 3
% and a vector with true and false is exactly what you get when you do
% logical testing
b = a<10
b = 1×4 logical array
1 1 0 0
Note that b seems to be made of integer values 0 and 1, but they are actually symbols that indicate the true/false state. The variable b is indeed a logical array and not a numerical array.
a(b) % this will give the same result as before
ans = 1×2
5 3
% and you can do any other type of more complex testing
b = a < 12 & a ~= 3 % the test says that values must be smaller than 12 AND also be different from 3
b = 1×4 logical array
1 0 0 0
a(b)
ans = 5
Remember that you can access the elements of a vector only by providing another vector of the same size, which has true for the elements to retain and false otherwise. This is an extremely powerful way to index elements and we will use it extensively.

Constructs

We expect from your previous programming courses that you already know what a for loop and a while loop are, and how an if... else statement works. You can check the specific Matlab syntax for these constructs at the Programming and Scripts tutorial.
For the course, you need to be able to:

Plotting

Go through the tutorial 2-D and 3-D Plots. We will just use 2-D plots for the course

Introductory examples

Let's see some examples here
% create some variables and plot them
x = 0:0.5:5;
y1 = 2*x.^2-1;
y2 = 3*x.^2;
% make a simple plot
plot(x,y1)
Note that, by default, datapoints are connected by a straight line but there is actualy no data in between datapoints! You can better see this by changing the plot style. You can find all properties on the Line Properties documentation page.
% modify the default marker and line styles
plot(x,y1,...
'Marker', 'o',...
'MarkerSize', 2, ...
'LineStyle', ':')
% also add some labels to make the graph clearer
title('Title')
xlabel('Variable x')
ylabel('Variable y')
If you you want to make a new plot, your current plot will be automatically closed and a new one opened. To avoid this and show your graphic on top of the previous plot, you can use the hold on command. Note that, unless you type hold off at the end, all the subsequent plots will be shown on the same axes.
% add some other figure on top
plot(x,y1,...
'Marker', 'o',...
'MarkerSize', 2, ...
'LineStyle', ':')
title('Title')
xlabel('Variable x')
ylabel('Variable y')
legend
hold on % this keeps the plot in the axes and adds all new ones on top
plot(x,y2,...
'Marker', '>',...
'MarkerSize', 4, ...
'LineStyle', 'none')
hold off % otherwise next plot will be again on top

Interactive plot editing

Matlab has a very useful feature to help you generate plots. From a plot window go to File>Generate Code... and Matlab will create a script that generates a plot exactly as you see it. Thus, you can manually modify the plot (if you go on Tools>Edit Plot, or View>Open Plot Browser) and then generate its script so you can reproduce it again in the future.

Save high-quality graphics

Saving high-quality figures is fundamental for presentations. As a general rule, avoid taking screenshots as they are very unprofessional, and rather use the ready-made tools to export figures. To save a plot using the default resolution, from a plot window you can go to File>Save As... and choose your preferred format. But you can also automate this in your code and include additional parameters. See the documentation as Save at Specific Size and Resolution and the example below.
% some examples
plot([0, 1, 3, 10],[0, 4, 2, 5]) % just an example plot
% save to file
print('my_figure','-dpng','-r600') % this saves 'my_figure.png' at a resolution of 600 DPI
print('my_figure','-painters','-depsc') % this saves 'my_figure.eps' (useful for LaTeX)
print('my_figure','-painters','-dpdf') % this saves 'my_figure.pdf' (useful for LaTeX)
% Note that in some older matlab versions, the size of the figure saved with print may
% not correspond to the one you see on the screen. If this occurs, you need
% to change a couple of default figure properties.
set(gcf,'Units','centimeters') %gcf is a variable that calls the current figure
siz = get(gcf,'Position'); siz=siz(end-1:end); %get the size of the figure as you see it on the screen
set(gcf,'PaperPositionMode','auto','PaperUnits',get(gcf,'Units'),...
'PaperSize',[1, 1.1].*siz); % use the size that you see on screen also when printing to file
For the course, you need to be able to:

Dealing with data

Nowadays you cannot afford to remain amateurs about data. You need to handle professionally the whole data workflow, from data import to data processing and analysis.
We will only work with data that comes in tabular structure. This means that it is organized into rows and columns, where each row corresponds to an observation and each column is a variable. The data typically include both numerical variables and text. It can either be imported as a set of vectors or as a table.

Matlab tables

A data table is a data container that has a tabular structure (rows and columns), where different columns can have different data types (numeric, boolean, character,... ). Importantly, all the elements of the same column must be of the same type. Tabular dataframes offer several indexing advantages and they are ideal for dealing with data. They may be slow for some data processing, but in this case you can simply extract variables into a vector or matrix. Matlab has its own tables.
Let's run some examples
% Open a matlab preinstalled dataset (patients.dat) and load it into a
% matlab table with the function 'readtable'
T = readtable('patients.dat');
% You can display the top 8 lines of the table using the function 'head'
head(T)
ans = 8×10 table
 LastNameGenderAgeLocationHeightWeightSmokerSystolicDiastolicSelfAssessedHealthStatus
1'Smith''Male'38'County General Hospital'71176112493'Excellent'
2'Johnson''Male'43'VA Hospital'69163010977'Fair'
3'Williams''Female'38'St. Mary's Medical Center'64131012583'Good'
4'Jones''Female'40'VA Hospital'67133011775'Fair'
5'Brown''Female'49'County General Hospital'64119012280'Good'
6'Davis''Female'46'St. Mary's Medical Center'68142012170'Good'
7'Miller''Female'33'VA Hospital'64142113088'Good'
8'Wilson''Male'40'VA Hospital'68180011582'Good'
A great advantage about tables is that you can access elements by column name (name indexing). For example, only select data corresponding to patients that have the variable 'Age' lower than a certain value. Using a variable name is more intuitive than using the variable position and it avoids selecting the wrong column by mistake.
% create a new variable
T.BMI = T.Weight ./ T.Height.^2;
% select a part of the data
q = T.Age < 40; % this will be my selection (or query) across the rows (logical indexing!)
T2 = T(q,:); % select all columns and a selection of rows according to vector r
head(T2)
ans = 8×11 table
 LastNameGenderAgeLocationHeightWeightSmokerSystolicDiastolicSelfAssessedHealthStatusBMI
1'Smith''Male'38'County General Hospital'71176112493'Excellent'0.0349
2'Williams''Female'38'St. Mary's Medical Center'64131012583'Good'0.0320
3'Miller''Female'33'VA Hospital'64142113088'Good'0.0347
4'Moore''Male'28'St. Mary's Medical Center'68183011578'Excellent'0.0396
5'Taylor''Female'31'County General Hospital'66132011886'Excellent'0.0303
6'Jackson''Male'25'VA Hospital'71174012774'Poor'0.0345
7'White''Male'39'VA Hospital'72202113095'Excellent'0.0390
8'Harris''Female'36'St. Mary's Medical Center'65129011479'Good'0.0305
% now only select Gender, Height and Weight data from female patients younger than 40 (logical indexing!)
q = T.Age < 40 & strcmp(T.Gender, 'Female'); % query to select female patients under 40
T2 = T(q,{'Gender','Height', 'Weight'}); % select columns by their name and rows according to the query q
head(T2,3) % just show the first three rows now
ans = 3×3 table
 GenderHeightWeight
1'Female'64131
2'Female'64142
3'Female'66132
% Use the dot notation to extract data into a vector or matrix.
% Extracting data is good for faster computing
H = T2.Height;
W = T2.Weight;
For the course, you need to be able to:

Import a textfile into a table

All course exercises will start by importing data, which usually comes in the form of a comma-separated-values (csv) file. A raw csv file typically looks like in the figure below: all elements are separated by a comma. Often there is a header on top of the file with useful metadata. Then, there is a row with the variable names and finally all the data.
If you have the MS Office package installed, it will try to open the data through MS Excel. Instead, try to open the raw data using a (free) program like Notepad++.
To import the above file into a Matlab Table, we need to use the readtable function with an option to specify that there are 5 headerlines. We would need a command like: T = readtable('fitness_members.csv', 'HeaderLines',5);
For the course, you need to be able to:

Dealing with dates

Often, the data that we want to import has one or more columns that represent dates. Dates are sometimes difficult to deal with because they are not in decimal system and they are full of discontinuities (e.g. leap years). Thus, programming languages typically offer dedicated tools. See how to Represent Dates and Times in MATLAB using the datetime arrays.
It is good practice to store dates as yyyy-mm-dd (for example 2020-03-15). The readtable function often converts automatically dates into datetime variables. But dates can be stored in several different ways (15/3/2020, 3.15.2020 etc) and sometimes you need to help Matlab identify them.
% examples with datetime variables
ts = datetime('30/07/2018','InputFormat','dd/MM/yyyy'); %specify that the date is in the format day month year and the separator is the forward slash /
% you can quickly access parts of the date
day(ts)
ans = 30
month(ts)
ans = 7
year(ts)
ans = 2018
% or you can convert the date to a vector where each element corresponds to
% a component of the date
date_v = datevec(ts)
date_v = 1×6
2018 7 30 0 0 0
% you can convert multiple timestamps to datetime
date_text = {'2018-01-22', '2018-07-28', '2018-09-05', '2019-12-31'};
ts2 = datetime(date_text,'InputFormat','yyyy-MM-dd'); % in this case the date format was year-month-day
What happens if you subtract two or more dates? You will get a 'duration' array, which tells how long is the interval between dates. The default format is hours:minutes:seconds
% use duration arrays
d = ts2 - ts
d = 1×4 duration
-4536:00:00 -48:00:00 888:00:00 12456:00:00
% you can convert the duration array to a number by calling the 'days' function
d_days = days(d)
d_days = 1×4
-189 -2 37 519
% check what happens if you type in day(d) instead of days(d)!
% and you can do the opposite and add a duration to a date
days_to_add = days([1,2,5000])
days_to_add = 1×3 duration
1 day 2 days 5000 days
ts3 = ts + days_to_add
ts3 = 1×3 datetime
31-Jul-2018 01-Aug-2018 07-Apr-2032
For the course, you need to be able to: