Required Matlab knowledge for the course Hydrology for Engineers
Last update: 2022-09-07
Foreword
Knowledge of basic programming is today of fundamental importance for any individual, in particular for scientists and engineers. EPFL is strenghtening its "computational thinking" character and this course is adapting by proposing basic —but rigorous— exercises that require machine computations.
This course is traditionally taught in Matlab, because it is a language easy to use and students can have free student licenses. Matlab is not open-source as R or Python, but the Octave project is open source and it has an almost full compatibility with Matlab syntax. This short guide lists the basics of Matlab programming that are required for the course exercises. We expect students to already know most of the content or to learn it quickly (before week 3). During exercise sessions we will not spend time on learning/revising these Matlab basics.
This document is generated using a notebook (Matlab live script) that mixes simple text (prose) and code. You will find frequent links to the Matlab guide, as Matlab generally has an excellent documentation. An additional online Matlab/Octave course in french is provided by the EPFL ENAC-IT team here: https://enacit.epfl.ch/cours/matlab-octave/ Introduction
Before we start with coding, remember two things:
1) never work inside a compressed (zipped) folder,
2) always check that your Current Folder in Matlab is the same folder where the script is saved.
You can open the documentation from your Command Window by typing these commands:
% type doc or doc <<topic>> to open the documentation
doc MATLAB %to open the documentation on 'MATLAB'
Elementary operations
Execute code chunks
One practical utility is to run just pieces (chunks) of code. You can do this by highlighting a piece of your script and then either: right-click and select "Evaluate Selection in Command Window", or press a shortcut key (F9 in Windows, function key + F9 on Mac).
Create and use vectors and matrices
Check the tutorial Matrices and Arrays or open directly the documentation into Matlab as a live script with: openExample('matlab/MatricesAndArraysGSExample') % to open examples directly into Matlab
For the course, you need to be able to:
- create an empty vector of any size, both as a row or as a column
- generate arbitrary sequences of numbers (for example, a vector going from 1 to 100 by steps of 2)
- sum vectors
- make the dot product and element-wise product among two vectors
- evaluate functions (for example, the sine or the square root) over a vector
- know what a NaN is and how it propagates through the operations mentioned at previous points
Indexing
Indexing is extremely important because it is what let us access the data that is stored in vectors and matrices. You can check this example:
openExample('matlab/MatrixIndexingExample')
Later when talking about tables we will see how to access a variable by its name. But here let's focus on 2 main ways to access the data: position indexing and logical indexing
Position indexing
This is simply accessing an element by its position. Remember that Matlab, differently from other languages, always starts from position 1 and gives error if you try to access position 0. Some examples:
a = [5, 3, 14.5, 13.4]
5.0000 3.0000 14.5000 13.4000
% now access the elements at position 2 or at position 2 and 3
% you can store the positions into a vector, which can access the same
% elements multiple times and can be as long as you wish
b = [2,2,2,4,1,1,4,4,3,3,3,3];
a(b)
3.0000 3.0000 3.0000 13.4000 5.0000 5.0000 13.4000 13.4000 14.5000 14.5000 14.5000 14.5000
% to access the last element of a vector, the 2 statements below are
Remember the indexes always need to be integer values and be included between position 1 and the length of the vector, otherwise you will generate an error.
Logical indexing
Logical indexing is based on logical tests
a = [5, 3, 14.5, 13.4]
5.0000 3.0000 14.5000 13.4000
% to access the elements, you need to give a true or false to each element
b = [true, true, false, false];
% and a vector with true and false is exactly what you get when you do
Note that b seems to be made of integer values 0 and 1, but they are actually symbols that indicate the true/false state. The variable b is indeed a logical array and not a numerical array.
a(b) % this will give the same result as before
% and you can do any other type of more complex testing
b = a < 12 & a ~= 3 % the test says that values must be smaller than 12 AND also be different from 3
Remember that you can access the elements of a vector only by providing another vector of the same size, which has true for the elements to retain and false otherwise. This is an extremely powerful way to index elements and we will use it extensively.
Constructs
We expect from your previous programming courses that you already know what a for loop and a while loop are, and how an if... else statement works. You can check the specific Matlab syntax for these constructs at the Programming and Scripts tutorial. For the course, you need to be able to:
- implement a for loop and a while loop
- implement an if... else statement
- pre-allocate variables, to make efficient computations
Plotting
Introductory examples
Let's see some examples here
% create some variables and plot them
Note that, by default, datapoints are connected by a straight line but there is actualy no data in between datapoints! You can better see this by changing the plot style. You can find all properties on the Line Properties documentation page. % modify the default marker and line styles
% also add some labels to make the graph clearer
If you you want to make a new plot, your current plot will be automatically closed and a new one opened. To avoid this and show your graphic on top of the previous plot, you can use the hold on command. Note that, unless you type hold off at the end, all the subsequent plots will be shown on the same axes.
% add some other figure on top
hold on % this keeps the plot in the axes and adds all new ones on top
hold off % otherwise next plot will be again on top
Interactive plot editing
Matlab has a very useful feature to help you generate plots. From a plot window go to File>Generate Code... and Matlab will create a script that generates a plot exactly as you see it. Thus, you can manually modify the plot (if you go on Tools>Edit Plot, or View>Open Plot Browser) and then generate its script so you can reproduce it again in the future.
Save high-quality graphics
Saving high-quality figures is fundamental for presentations. As a general rule, avoid taking screenshots as they are very unprofessional, and rather use the ready-made tools to export figures. To save a plot using the default resolution, from a plot window you can go to File>Save As... and choose your preferred format. But you can also automate this in your code and include additional parameters. See the documentation as Save at Specific Size and Resolution and the example below. plot([0, 1, 3, 10],[0, 4, 2, 5]) % just an example plot
print('my_figure','-dpng','-r600') % this saves 'my_figure.png' at a resolution of 600 DPI
print('my_figure','-painters','-depsc') % this saves 'my_figure.eps' (useful for LaTeX)
print('my_figure','-painters','-dpdf') % this saves 'my_figure.pdf' (useful for LaTeX)
% Note that in some older matlab versions, the size of the figure saved with print may
% not correspond to the one you see on the screen. If this occurs, you need
% to change a couple of default figure properties.
set(gcf,'Units','centimeters') %gcf is a variable that calls the current figure
siz = get(gcf,'Position'); siz=siz(end-1:end); %get the size of the figure as you see it on the screen
set(gcf,'PaperPositionMode','auto','PaperUnits',get(gcf,'Units'),...
'PaperSize',[1, 1.1].*siz); % use the size that you see on screen also when printing to file
For the course, you need to be able to:
- make basic 2-D plots to display your data
- know how to complete your plot with meaningful title, legend and labels
- save a plot to a high-resolution vectorial (pdf) or raster (jpg or png) format
Dealing with data
Nowadays you cannot afford to remain amateurs about data. You need to handle professionally the whole data workflow, from data import to data processing and analysis.
We will only work with data that comes in tabular structure. This means that it is organized into rows and columns, where each row corresponds to an observation and each column is a variable. The data typically include both numerical variables and text. It can either be imported as a set of vectors or as a table.
Matlab tables
A data table is a data container that has a tabular structure (rows and columns), where different columns can have different data types (numeric, boolean, character,... ). Importantly, all the elements of the same column must be of the same type. Tabular dataframes offer several indexing advantages and they are ideal for dealing with data. They may be slow for some data processing, but in this case you can simply extract variables into a vector or matrix. Matlab has its own tables. Let's run some examples
% Open a matlab preinstalled dataset (patients.dat) and load it into a
% matlab table with the function 'readtable'
T = readtable('patients.dat');
% You can display the top 8 lines of the table using the function 'head'
head(T)
ans = 8×10 table
| | LastName | Gender | Age | Location | Height | Weight | Smoker | Systolic | Diastolic | SelfAssessedHealthStatus |
|---|
| 1 | 'Smith' | 'Male' | 38 | 'County General Hospital' | 71 | 176 | 1 | 124 | 93 | 'Excellent' |
|---|
| 2 | 'Johnson' | 'Male' | 43 | 'VA Hospital' | 69 | 163 | 0 | 109 | 77 | 'Fair' |
|---|
| 3 | 'Williams' | 'Female' | 38 | 'St. Mary's Medical Center' | 64 | 131 | 0 | 125 | 83 | 'Good' |
|---|
| 4 | 'Jones' | 'Female' | 40 | 'VA Hospital' | 67 | 133 | 0 | 117 | 75 | 'Fair' |
|---|
| 5 | 'Brown' | 'Female' | 49 | 'County General Hospital' | 64 | 119 | 0 | 122 | 80 | 'Good' |
|---|
| 6 | 'Davis' | 'Female' | 46 | 'St. Mary's Medical Center' | 68 | 142 | 0 | 121 | 70 | 'Good' |
|---|
| 7 | 'Miller' | 'Female' | 33 | 'VA Hospital' | 64 | 142 | 1 | 130 | 88 | 'Good' |
|---|
| 8 | 'Wilson' | 'Male' | 40 | 'VA Hospital' | 68 | 180 | 0 | 115 | 82 | 'Good' |
|---|
A great advantage about tables is that you can access elements by column name (name indexing). For example, only select data corresponding to patients that have the variable 'Age' lower than a certain value. Using a variable name is more intuitive than using the variable position and it avoids selecting the wrong column by mistake.
T.BMI = T.Weight ./ T.Height.^2;
% select a part of the data
q = T.Age < 40; % this will be my selection (or query) across the rows (logical indexing!)
T2 = T(q,:); % select all columns and a selection of rows according to vector r
head(T2)
ans = 8×11 table
| | LastName | Gender | Age | Location | Height | Weight | Smoker | Systolic | Diastolic | SelfAssessedHealthStatus | BMI |
|---|
| 1 | 'Smith' | 'Male' | 38 | 'County General Hospital' | 71 | 176 | 1 | 124 | 93 | 'Excellent' | 0.0349 |
|---|
| 2 | 'Williams' | 'Female' | 38 | 'St. Mary's Medical Center' | 64 | 131 | 0 | 125 | 83 | 'Good' | 0.0320 |
|---|
| 3 | 'Miller' | 'Female' | 33 | 'VA Hospital' | 64 | 142 | 1 | 130 | 88 | 'Good' | 0.0347 |
|---|
| 4 | 'Moore' | 'Male' | 28 | 'St. Mary's Medical Center' | 68 | 183 | 0 | 115 | 78 | 'Excellent' | 0.0396 |
|---|
| 5 | 'Taylor' | 'Female' | 31 | 'County General Hospital' | 66 | 132 | 0 | 118 | 86 | 'Excellent' | 0.0303 |
|---|
| 6 | 'Jackson' | 'Male' | 25 | 'VA Hospital' | 71 | 174 | 0 | 127 | 74 | 'Poor' | 0.0345 |
|---|
| 7 | 'White' | 'Male' | 39 | 'VA Hospital' | 72 | 202 | 1 | 130 | 95 | 'Excellent' | 0.0390 |
|---|
| 8 | 'Harris' | 'Female' | 36 | 'St. Mary's Medical Center' | 65 | 129 | 0 | 114 | 79 | 'Good' | 0.0305 |
|---|
% now only select Gender, Height and Weight data from female patients younger than 40 (logical indexing!)
q = T.Age < 40 & strcmp(T.Gender, 'Female'); % query to select female patients under 40
T2 = T(q,{'Gender','Height', 'Weight'}); % select columns by their name and rows according to the query q
head(T2,3) % just show the first three rows now
ans = 3×3 table
| | Gender | Height | Weight |
|---|
| 1 | 'Female' | 64 | 131 |
|---|
| 2 | 'Female' | 64 | 142 |
|---|
| 3 | 'Female' | 66 | 132 |
|---|
% Use the dot notation to extract data into a vector or matrix.
% Extracting data is good for faster computing
For the course, you need to be able to:
- access table data by name indexing
- extract data into a vector or matrix
Import a textfile into a table
All course exercises will start by importing data, which usually comes in the form of a comma-separated-values (csv) file. A raw csv file typically looks like in the figure below: all elements are separated by a comma. Often there is a header on top of the file with useful metadata. Then, there is a row with the variable names and finally all the data.
If you have the MS Office package installed, it will try to open the data through MS Excel. Instead, try to open the raw data using a (free) program like Notepad++.
To import the above file into a Matlab Table, we need to use the readtable function with an option to specify that there are 5 headerlines. We would need a command like: T = readtable('fitness_members.csv', 'HeaderLines',5);
For the course, you need to be able to:
- succesfully use the readtable function to import tabular data
Dealing with dates
Often, the data that we want to import has one or more columns that represent dates. Dates are sometimes difficult to deal with because they are not in decimal system and they are full of discontinuities (e.g. leap years). Thus, programming languages typically offer dedicated tools. See how to Represent Dates and Times in MATLAB using the datetime arrays. It is good practice to store dates as yyyy-mm-dd (for example 2020-03-15). The readtable function often converts automatically dates into datetime variables. But dates can be stored in several different ways (15/3/2020, 3.15.2020 etc) and sometimes you need to help Matlab identify them.
% examples with datetime variables
ts = datetime('30/07/2018','InputFormat','dd/MM/yyyy'); %specify that the date is in the format day month year and the separator is the forward slash /
% you can quickly access parts of the date
% or you can convert the date to a vector where each element corresponds to
% a component of the date
% you can convert multiple timestamps to datetime
date_text = {'2018-01-22', '2018-07-28', '2018-09-05', '2019-12-31'};
ts2 = datetime(date_text,'InputFormat','yyyy-MM-dd'); % in this case the date format was year-month-day
What happens if you subtract two or more dates? You will get a 'duration' array, which tells how long is the interval between dates. The default format is hours:minutes:seconds
d = ts2 - ts
-4536:00:00 -48:00:00 888:00:00 12456:00:00
% you can convert the duration array to a number by calling the 'days' function
% check what happens if you type in day(d) instead of days(d)!
% and you can do the opposite and add a duration to a date
days_to_add = days([1,2,5000])
ts3 = ts + days_to_add
31-Jul-2018 01-Aug-2018 07-Apr-2032
For the course, you need to be able to:
- convert dates into datetime format
- extract elements of a date (year, month, day, day of the week, Julian day, etc)