Useful functions for Machine Learning Programming

Throughout this course you will be asked to program Machine Learning algorithms seen in the Applied Machine Learning course. In this article we will cover some useful functions that will ease your programming.

Random Number Generator

Random number are not random in Computer Science. The generation of random numbers is performed by algorithms often referred as pseudo-random number generator. In Matlab, a random number is drawn using the rand function.

In [1]:
disp(rand())
    0.8147

Start Matlab and call this function. If you had Matlab already started just close it and restart it before. Did you actually end up with the same exact number as above? Okay maybe it was a coincidence let's try something else.

In [2]:
disp(rand(3))
    0.9058    0.6324    0.5469
    0.1270    0.0975    0.9575
    0.9134    0.2785    0.9649

This is simply the effect of the seed in randon number generation. Matlab actually set the same seed at startup. Not so random after all.

But for some applications it can be very useful to be able to replicate the series of numbers drafted by the generator. Take the case of comparing two ML models as an example. For your comparison to be totally fair you want to ensure that your data are shuffled randomly but in the exact same way with both models. Your best option would be the use of random seed. In Matlab, seed are defined as follow.

In [3]:
n = 42; % I can take any integer here so let's take the most random one...
rng(n)

After setting a seed, every time you draw a randon number you will always get the same succession of numbers. This is great! Let's create an array of 10 random numbers and shuffle it.

In [4]:
tmp = rand(1,10);
tmp = tmp(randperm(length(tmp))); % I know it looks weird but what this does is simply shuffling your array
disp(tmp)
  Columns 1 through 7

    0.3745    0.1560    0.1560    0.5987    0.7081    0.0581    0.6011

  Columns 8 through 10

    0.8662    0.7320    0.9507

Did we get the same results? The great thing is that you can actually save a seed to reuse it in a function for example.

In [5]:
rng(42)
seed = rng; % don't write seed = rng(42). I know it looks more natural but it won't work as you expect

Later in your script you can call this seed again to restart your random number generation.

In [6]:
rng(seed)

You can even pass the seed object as argument of your functions and call the previous line to reset the sequence.

Now that you know everything about random number generation you can finally fully undestrand the plot of episode 6 of season 10 of Doctor Who.

Assert function

Testing matters! It is even probably (at least it should be) the most important steps of your programming session. Everytime you can make some assumptions about the output of your programm just call the assert function.

In [7]:
A = zeros(3);
assert(size(A, 1) == 3)

When the condition in your assertion is false it will simply return an error and stop your programm.

In [8]:
assert(size(A, 2) == 1) % How weird is that that this zero function return a matrix instead of an array...
Assertion failed.

Using the assert function can really help you to program faster and more efficiently. So be proactive and use it a lot!

Indexing

You probably now indexing. In Matlab it is done with ().

In [9]:
rng(12)
tmp = rand(3); disp(tmp)
tmp(2,1)
    0.1542    0.5337    0.9007
    0.7400    0.0146    0.0334
    0.2633    0.9187    0.9569


ans =

    0.7400

But did you know that you can use an array of integers as indexes of another array?

In [10]:
tmp([1,3])
ans =

    0.1542    0.2633

It works also with matrices where an array can be used for each dimensions.

In [11]:
tmp([1,3], [3,2])
ans =

    0.9007    0.5337
    0.9569    0.9187

Be careful with those though as it gets really tricky to understand what your code is actually doing. Another cool thing with indexing is that it can be performed using logical arrays. If you don't know about those keep reading.

Logical arrays

Logical arrays are also an interesting functionnality. Basically you can apply a logic condition on every cell of an array. The output will be a logical array containing boolean values. In Matlab, a boolean is 0 for False and 1 for True.

Let's create a random array and assume we want to know all the values that are greater thant 0.5.

In [12]:
rng(33) % finding a different integer  everytime is cumbersome. On the next call I will try something else.
tmp = rand(1,10); disp(tmp)
  Columns 1 through 7

    0.2485    0.4500    0.4109    0.2603    0.8704    0.1850    0.0197

  Columns 8 through 10

    0.9533    0.6805    0.4866

Let's apply the logic expression on the array.

In [13]:
idx = tmp > 0.5
idx =

  1x10 logical array

   0   0   0   0   1   0   0   1   1   0

As said previously you can then use this logical array as index to actually select only the numbers that respect the condition, i.e. the numbers greater than 0.5.

In [14]:
tmp(idx)
ans =

    0.8704    0.9533    0.6805

What about if you want the elements that are less than 0.5. Well this is a logical array so you can also apply logic operator on it, e.g. the not operator ~

In [15]:
tmp(~idx)
ans =

    0.2485    0.4500    0.4109    0.2603    0.1850    0.0197    0.4866

Reshaping

Reshaping is a very important functionnality. It allows you to modify the dimensionnality of an array. Again let's create a random array.

In [16]:
rng(randi(1000)) % Oh yeah that gets nerdy! Ok I am messing with you here. There are far better options for that.
tmp = rand(1,10); disp(tmp)
  Columns 1 through 7

    0.0733    0.8428    0.5236    0.1550    0.9964    0.8289    0.9595

  Columns 8 through 10

    0.3882    0.0145    0.8232

Now what if you want to transform this array into a matrix of size 5x2. Here comes the reshape function.

In [17]:
tmp = reshape(tmp, 5, 2)
tmp =

    0.0733    0.8289
    0.8428    0.9595
    0.5236    0.3882
    0.1550    0.0145
    0.9964    0.8232

But be careful, for this command to work correctly your reshaping cannot change the number of elements of your array.

In [18]:
tmp = reshape(tmp, 6, 2)
Error using reshape
To RESHAPE the number of elements must not change.

Be aware that as long as you are keeping the same number of elements you can reshape to any numbers of dimensions as you want.

In [20]:
tmp = reshape(tmp, 5, 1, 1, 2)
tmp(:,:,1,1) =

    0.0733
    0.8428
    0.5236
    0.1550
    0.9964


tmp(:,:,1,2) =

    0.8289
    0.9595
    0.3882
    0.0145
    0.8232

You had enough already? Care for some more? Have you ever heard about anonymous functions? I am sure it is good for your problems.

Anonymous Functions and Symbolic calculation

This functionnality is probably one of the most important for a mathematic framework. Let's say you want to define a function that takes some input, e.g. f(x,y), but you don't know in advance the values that x and y will take. The following command won't work.

In [21]:
f = x + y - 2
Error using eval
Undefined function or variable 'x'.

One way to achieve that is to use function handlers.

In [22]:
f = @(x,y) x + y - 2
f =

  function_handle with value:

    @(x,y)x+y-2

After defining a function handler, you can call it later with the desired values as input.

In [23]:
f(5,3)
ans =

     6

Note that function handlers work for any type of functions and not only mathematical expressions. Actually for mathematics, it is preferable to use the symbolic calculations. You can define a symbolic variable using the keyword syms.

In [24]:
syms x y;
f = x^3 + y^2 - 2
 
f =
 
x^3 + y^2 - 2
 

The advantages of symbolic caclulation is that you can apply many mathematical tools such as derivation, composition...

In [25]:
diff(f) % This performs derivation of the function
 
ans =
 
3*x^2
 

You have probably observed that by default Matlab performs the differentiation over the first variable. To perform it on the second one simply specify it.

In [26]:
diff(f,y)
 
ans =
 
2*y
 

There is a lot to do with symbolic calculation. I let you have a deeper look at the documentation for it.