Throughout this course you will be asked to program Machine Learning algorithms seen in the Applied Machine Learning course. In this article we will cover some useful functions that will ease your programming.
Random number are not random in Computer Science. The generation of random numbers is performed by algorithms often referred as pseudo-random number generator. In Matlab, a random number is drawn using the rand function.
disp(rand())
Start Matlab and call this function. If you had Matlab already started just close it and restart it before. Did you actually end up with the same exact number as above? Okay maybe it was a coincidence let's try something else.
disp(rand(3))
This is simply the effect of the seed in randon number generation. Matlab actually set the same seed at startup. Not so random after all.
But for some applications it can be very useful to be able to replicate the series of numbers drafted by the generator. Take the case of comparing two ML models as an example. For your comparison to be totally fair you want to ensure that your data are shuffled randomly but in the exact same way with both models. Your best option would be the use of random seed. In Matlab, seed are defined as follow.
n = 42; % I can take any integer here so let's take the most random one...
rng(n)
After setting a seed, every time you draw a randon number you will always get the same succession of numbers. This is great! Let's create an array of 10 random numbers and shuffle it.
tmp = rand(1,10);
tmp = tmp(randperm(length(tmp))); % I know it looks weird but what this does is simply shuffling your array
disp(tmp)
Did we get the same results? The great thing is that you can actually save a seed to reuse it in a function for example.
rng(42)
seed = rng; % don't write seed = rng(42). I know it looks more natural but it won't work as you expect
Later in your script you can call this seed again to restart your random number generation.
rng(seed)
You can even pass the seed object as argument of your functions and call the previous line to reset the sequence.
Now that you know everything about random number generation you can finally fully undestrand the plot of episode 6 of season 10 of Doctor Who.
Testing matters! It is even probably (at least it should be) the most important steps of your programming session. Everytime you can make some assumptions about the output of your programm just call the assert function.
A = zeros(3);
assert(size(A, 1) == 3)
When the condition in your assertion is false it will simply return an error and stop your programm.
assert(size(A, 2) == 1) % How weird is that that this zero function return a matrix instead of an array...
Using the assert function can really help you to program faster and more efficiently. So be proactive and use it a lot!
You probably now indexing. In Matlab it is done with ().
rng(12)
tmp = rand(3); disp(tmp)
tmp(2,1)
But did you know that you can use an array of integers as indexes of another array?
tmp([1,3])
It works also with matrices where an array can be used for each dimensions.
tmp([1,3], [3,2])
Be careful with those though as it gets really tricky to understand what your code is actually doing. Another cool thing with indexing is that it can be performed using logical arrays. If you don't know about those keep reading.
Logical arrays are also an interesting functionnality. Basically you can apply a logic condition on every cell of an array. The output will be a logical array containing boolean values. In Matlab, a boolean is 0 for False and 1 for True.
Let's create a random array and assume we want to know all the values that are greater thant 0.5.
rng(33) % finding a different integer everytime is cumbersome. On the next call I will try something else.
tmp = rand(1,10); disp(tmp)
Let's apply the logic expression on the array.
idx = tmp > 0.5
As said previously you can then use this logical array as index to actually select only the numbers that respect the condition, i.e. the numbers greater than 0.5.
tmp(idx)
What about if you want the elements that are less than 0.5. Well this is a logical array so you can also apply logic operator on it, e.g. the not operator ~
tmp(~idx)
Reshaping is a very important functionnality. It allows you to modify the dimensionnality of an array. Again let's create a random array.
rng(randi(1000)) % Oh yeah that gets nerdy! Ok I am messing with you here. There are far better options for that.
tmp = rand(1,10); disp(tmp)
Now what if you want to transform this array into a matrix of size 5x2. Here comes the reshape function.
tmp = reshape(tmp, 5, 2)
But be careful, for this command to work correctly your reshaping cannot change the number of elements of your array.
tmp = reshape(tmp, 6, 2)
Be aware that as long as you are keeping the same number of elements you can reshape to any numbers of dimensions as you want.
tmp = reshape(tmp, 5, 1, 1, 2)
You had enough already? Care for some more? Have you ever heard about anonymous functions? I am sure it is good for your problems.
This functionnality is probably one of the most important for a mathematic framework. Let's say you want to define a function that takes some input, e.g. f(x,y), but you don't know in advance the values that x and y will take. The following command won't work.
f = x + y - 2
One way to achieve that is to use function handlers.
f = @(x,y) x + y - 2
After defining a function handler, you can call it later with the desired values as input.
f(5,3)
Note that function handlers work for any type of functions and not only mathematical expressions. Actually for mathematics, it is preferable to use the symbolic calculations. You can define a symbolic variable using the keyword syms.
syms x y;
f = x^3 + y^2 - 2
The advantages of symbolic caclulation is that you can apply many mathematical tools such as derivation, composition...
diff(f) % This performs derivation of the function
You have probably observed that by default Matlab performs the differentiation over the first variable. To perform it on the second one simply specify it.
diff(f,y)
There is a lot to do with symbolic calculation. I let you have a deeper look at the documentation for it.