def fn(x):
y = 1
return x + y + 2*zEPFL
2024-09-20
What
Common data models
Why
“Knowledge representation”
Structuring data in a particular way facilitates data access and specific operations appropriate for that structure.
|
|
|
|
|
Data tables (implemented as pandas DataFrames) are useful for a variety of data processing operations
Joins
Hierarchical (tree-like) data can be thought of as a subset of graph data models consisting of edges and nodes. (Graph models can additionally represent more complex networks.)
What
Programming paradigms
Various aspects of these paradigms are not necessarily orthogonal to each other and can be used together in the same language.
Why
Alternative ways to express computation.
Modularity and code reuse.
What
Computation primarily by evaluation of functions.
Features:
Why
Advantages
Some languages enforce these features strictly (Haskell).
Some languages (Python and MATLAB) borrow constructs from FP to enable a “functional style” and receieve some of its benefits (e.g., reducing common programming errors), but does not strictly enforce immutability.
Problems with mutation of global variables
How to reduce instances of mutation:
Definitions
x: bound variabley: local variablez: free variablePython uses lexical scoping - value of z is taken from the namespace in which it was defined. (A namespace is can be thought of the collection of symbols (objects/variables of the program) and their associated values.)
Namespaces
If your code refers to the name x, then Python searches for x in the following namespaces in the order shown (ref):
fn is defined in the global environment.
fn is still defined in the global environment, as is z=3.
In this last example, fn is an example of a closure - a function in which data is attached (i.e., the value of z).
fn is still defined in the global environment, as is z=3.
Motivating example
Intuitive behavior:
Surprising behavior:
In Python: scalar arguments are passed by value; compound data type (dictionaries and lists) are effectively passed by reference.
To prevent modification of the global variable, use copy or deepcopy.
Constructs to reduce global mutations
List comprehensions mirror set builder notation in mathematics. Example from here.
Mathematical expression - let \(\mathcal{I}\) be the set of all integers.
\[\begin{equation*} \{x^2 : x \in \mathcal{I} \vee 0 \leq x < 10 \vee \textrm{prime}(x)\} \end{equation*}\]Python notation.
Note that x does not exist outside of the scope of the list comprehension in contrast to performing this calculation wtih a loop.
List comprehensions are “safer” in that the variable containing individual elements are not introduced or modified outside of the expression.
Example using map, filter, and partial function application:
[4, 9, 25, 49]
or using anonymous functions
Note that map returns an iterator (effectively, an unevaluated sequence) so we wrap the output in list() to obtain the results.
Use ternary operator expression for conditional statements. Instead of
we can rewrite this as an expression:
Function that returns a function
Example from here.
Compute the root of the function
\[ f(x) = x^3 - 100 x^2 - x + 100 \]
Define functions
from math import sin, cos, pi, sqrt
import wave
import array
import matplotlib.pyplot as plt
from functools import reduce, partial
def readWavFile(filename):
wav = wave.open(filename)
framerate = wav.getframerate()
bytes = wav.readframes(wav.getnframes())
signal = array.array('h', bytes).tolist()
return signal, framerate
def fourier(signal, r, f):
seq = range(len(signal))
x = reduce(lambda acc, k: acc + signal[k] * cos(k * 2 * pi * f / r), seq, 0)
y = reduce(lambda acc, k: acc + signal[k] * sin(k * 2 * pi * f / r), seq, 0)
e = reduce(lambda acc, k: acc + signal[k]**2, seq, 0)
return sqrt(2 * (x * x + y * y) / (e * len(signal)))
"""
General moving window function
Parameters
-----
fn: function
seq: sequence
n: window
d: increment
Return value
-----
processed signal as list
"""
def movingwindow(fn, seq, n, d=None):
if not d:
d = n
return [fn(seq[i:i+n]) for i in range(0, len(seq), d)]
def findLocalMaxima(time, power, threshold):
def fn(acc, i):
if acc['state'] == 0:
if power[i] > threshold:
acc['state'] = 1
acc['localMaximum'] = i
elif acc['state'] == 1:
if power[i] > power[acc['localMaximum']]:
acc['localMaximum'] = i
elif power[i] < threshold:
acc['state'] = 0
acc['maxima'] += [acc['localMaximum']]
return acc
out = reduce(fn, range(len(power)), {'state': 0, 'localMaximum': 0, 'maxima': []})
return out['maxima']Import data and apply functions
nframes = 1000
frequency = 2260 # Hz
window_size = 100
window_incr = 20
threshold = 0.16
signal, framerate = readWavFile('recording.wav')
freqTime = movingwindow(lambda x: x[0]/framerate, range(len(signal)), nframes)
freqPower = movingwindow(partial(fourier, r=framerate, f=frequency), signal, nframes)
birdTime = movingwindow(lambda x: x[0], freqTime, window_size, window_incr)
birdPower = movingwindow(lambda x: sum(x)/len(x), freqPower, window_size, window_incr)
maxima = findLocalMaxima(birdTime, birdPower, threshold)Plot
plt.plot(freqTime, freqPower, label = f'{frequency:d} Hz (puissance)')
plt.plot(birdTime, birdPower, label = 'Signal "oiseau"')
for i in maxima:
time = birdTime[i]
minute = int(time / 60)
seconde = int(time % 60)
timeText = f'{minute:02d}:{seconde:02d}'
plt.annotate(timeText, xy = (birdTime[i], birdPower[i]),
horizontalalignment = 'center',
xytext = (birdTime[i], birdPower[i] + 0.2),
arrowprops = dict(facecolor = 'black', width = 2))
print('Oiseau à ' + timeText)
plt.legend()
plt.show()What
A unit of data with associated functions.
Why
Alternatives to method dispatch:
Define classes
from math import sin, cos, pi, sqrt
import wave
import array
import matplotlib.pyplot as plt
class RawSignal:
def __init__(self, wavefile):
self.readWavFile(wavefile)
def readWavFile(self, filename):
wav = wave.open(filename)
bytes = wav.readframes(wav.getnframes())
self.framerate = wav.getframerate()
self.signal = array.array('h', bytes).tolist()
def __fourier(self, signal, r, f): # private function
x = 0.0
y = 0.0
e = 0.0
for i, s in enumerate(signal):
x += s * cos(i * 2 * pi * f / r)
y += s * sin(i * 2 * pi * f / r)
e += s * s
return sqrt(2 * (x * x + y * y) / (e * len(signal)))
def extractfreq(self, f, nframes):
signal = self.signal
r = self.framerate
time = []
power = []
for k in range(0, len(signal), nframes):
time.append(k / r)
power.append(self.__fourier(signal[k:k + nframes], r, f))
return FreqPower(time, power)
class FreqPower:
def __init__(self, time, power):
self.time = time
self.power = power
def movingaverage(self, n, d):
time = self.time
power = self.power
aveTime = []
avePower = []
for k in range(0, len(power) - n, d):
aveTime.append(time[k])
avePower.append(sum(power[k:k + n]) / n)
self.time = aveTime
self.power = avePower
def localMaxima(self, threshold):
time = self.time
power = self.power
state = 0
localMaximum = 0
maxima = []
for i, value in enumerate(power):
if state == 0:
if power[i] > threshold:
state = 1
localMaximum = i
elif state == 1:
if power[i] > power[localMaximum]:
localMaximum = i
elif power[i] < threshold:
state = 0
maxima.append(localMaximum)
return maximaInstantiate objects and apply methods
To use the same code for plotting, I will create a reference to them (assignment = does not copy lists in Python)
plt.plot(freqTime, freqPower, label = f'{frequency:d} Hz (puissance)')
plt.plot(birdTime, birdPower, label = 'Signal "oiseau"')
for i in maxima:
time = birdTime[i]
minute = int(time / 60)
seconde = int(time % 60)
timeText = f'{minute:02d}:{seconde:02d}'
plt.annotate(timeText,
xy = (birdTime[i], birdPower[i]),
horizontalalignment = 'center',
xytext = (birdTime[i], birdPower[i] + 0.2),
arrowprops = dict(facecolor = 'black', width = 2))
print('Oiseau à ' + timeText)
plt.legend()
plt.show()Here define a parent class Topo, and two children classes TopoXYZ and TopoMNT associated with two different data representations (and file formats) for storing topological information introduced in sieprog.ch.
TopoXYZ and TopoMNT classes have different methods for 1) reading in files and 2) preprocessing the data to provide x, y, and z arrays for plotting.Topo for displaying the topographic map in the same format.Cont’d
class TopoXYZ(Topo):
def __init__(self, fname, res):
xyz = np.genfromtxt(fname, delimiter = ' ', dtype = None)
self.x = xyz[:, 0]
self.y = xyz[:, 1]
self.z = xyz[:, 2]
self.res = res
def plot(self, xlabel = None, ylabel = None):
x = self.x
y = self.y
z = self.z
res = self.res
xmin = x.min()
ymin = y.min()
xidx = ((x - xmin) / res).astype('int32')
yidx = ((y - ymin) / res).astype('int32')
xp = xmin + np.arange(0, xidx.max() + 1) * res
yp = ymin + np.arange(0, yidx.max() + 1) * res
array = np.full((yidx.max() + 1, xidx.max() + 1), np.nan)
array[yidx, xidx] = z
super().plot(xp, yp, array)
class TopoMNT(Topo):
def __init__(self, fname, nx, ny, res):
self.values = np.fromfile(fname, np.float64)
self.nx = nx
self.ny = ny
self.res = res
def plot(self):
xp = np.arange(self.nx) * self.res
yp = np.arange(self.ny) * self.res
array = self.values.reshape(self.ny, self.nx)
super().plot(xp, yp, array)Implementing method dispatch:
Class definitions only include attributes.
import numpy as np
import matplotlib.pyplot as plt
from functools import singledispatch
class TopoXYZ(Topo):
def __init__(self, fname, res):
xyz = np.genfromtxt(fname, delimiter = ' ', dtype = None)
self.x = xyz[:, 0]
self.y = xyz[:, 1]
self.z = xyz[:, 2]
self.res = res
class TopoMNT(Topo):
def __init__(self, fname, nx, ny, res):
self.values = np.fromfile(fname, np.float64)
self.nx = nx
self.ny = ny
self.res = resplot_topo is a generic function under which class-specific methods are defined (definitions are same as before)
@singledispatch
def plot_topo(xp, yp, array, xlabel = 'X [m]', ylabel = 'Y [m]'):
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.pcolormesh(xp, yp, array, shading = 'auto')
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
@plot_topo.register(TopoXYZ)
def _(obj, *args, **kwargs):
x = obj.x
y = obj.y
z = obj.z
res = obj.res
xmin = x.min()
ymin = y.min()
xidx = ((x - xmin) / res).astype('int32')
yidx = ((y - ymin) / res).astype('int32')
xp = xmin + np.arange(0, xidx.max() + 1) * res
yp = ymin + np.arange(0, yidx.max() + 1) * res
array = np.full((yidx.max() + 1, xidx.max() + 1), np.nan)
array[yidx, xidx] = z
plot_topo(xp, yp, array, *args, **kwargs)
@plot_topo.register(TopoMNT)
def _(obj, *args, **kwargs):
xp = np.arange(obj.nx) * obj.res
yp = np.arange(obj.ny) * obj.res
array = obj.values.reshape(obj.ny, obj.nx)
plot_topo(xp, yp, array, *args, **kwargs)Plotting method is determined by class of first argument.
To perform method dispatch, it is not necessary that the object have a certain type, but that it has certain attributes or methods.
Example:
The operator + or, equivalently, operator.add, will perform addition on any objects if it has an “add” attribute.
Another example: