ENG270 Python topics

Satoshi Takahama

EPFL

2024-09-18

REPL

Read-Eval-Print-Loop (REPL)

Progression in usage:

Type commands in REPL
Copy-paste commands to REPL
Send commands to REPL

Use case:

Don’t rerun whole code after a small changes in your code
Use REPL instead of print commands while developing your script

Next step:

Use integrated debugger

NumPy arrays

Lists vs NumPy arrays

list

# importing required packages
import numpy
import time

# size of arrays and lists
size = 1000000

# declaring lists
list1 = range(size)
list2 = range(size)

# list
initialTime = time.time()
resultantList = [(a * b) for a, b in zip(list1, list2)]

# calculating execution time
print("Time taken by Lists :", 
    (time.time() - initialTime),
    "seconds")

Time taken by Lists : 1.1984527111053467 seconds

NumPy array

# importing required packages
import numpy
import time

# size of arrays and lists
size = 1000000

# declaring arrays
array1 = numpy.arange(size)  
array2 = numpy.arange(size)

# NumPy array
initialTime = time.time()
resultantArray = array1 * array2

# calculating execution time 
print("Time taken by NumPy Arrays :",
    (time.time() - initialTime),

Time taken by NumPy Arrays : 0.13434123992919922 seconds

example from geeksforgeeks.org

NumPy arrays

list

Sequential collection of items
Heterogeneous data types stored noncontiguously in memory

NumPy array*

Sequential collection of items - but can also be \(N\)-dimensional
Can be heterogeneous data, but more commonly used for homogeneous data types stored contiguously in memory
Many mathematical operations defined for them

*Note that NumPy arrays are not the same as arrays from the built-in array module.

figure source: geeksforgeeks.org

Why NumPy arrays

Faster
1. data retrieval: use of contiguous memory - less overhead on memory access
2. reduced type-checking: don’t need to check types for every calculation
3. call underlying C / Fortran functions that are ready to be executed
4. can further optimize performance for same operation across data set (single instruction / multiple data)

More readable (“vectorization” and “broadcasting”)
1. no need to write loops - no looping variables; code is shorter
2. code is closer to mathematical notation (involving vectors, matrices, arrays)*

*Note that the analogy of terminology in math is different from programming

0D, 1D, 2D, ND arrays in mathematics are scalars, vectors, matrices, etc.
0D arrays in NumPy are still NumPy arrays.

When to use lists instead of NumPy arrays

Elements have different data types
When the size is not know a priori. Growing a NumPy array is much slower than (need to find contiguous memory chunks).