ENG270 Python topics

Satoshi Takahama

EPFL

2024-09-18

REPL

Read-Eval-Print-Loop (REPL)

Progression in usage:

  1. Type commands in REPL
  2. Copy-paste commands to REPL
  3. Send commands to REPL

Use case:

  • Don’t rerun whole code after a small changes in your code
  • Use REPL instead of print commands while developing your script

Next step:

  • Use integrated debugger

NumPy arrays

Lists vs NumPy arrays

list


# importing required packages
import numpy
import time

# size of arrays and lists
size = 1000000

# declaring lists
list1 = range(size)
list2 = range(size)

# list
initialTime = time.time()
resultantList = [(a * b) for a, b in zip(list1, list2)]

# calculating execution time
print("Time taken by Lists :", 
    (time.time() - initialTime),
    "seconds")
Time taken by Lists : 1.1984527111053467 seconds

NumPy array


# importing required packages
import numpy
import time

# size of arrays and lists
size = 1000000

# declaring arrays
array1 = numpy.arange(size)  
array2 = numpy.arange(size)

# NumPy array
initialTime = time.time()
resultantArray = array1 * array2

# calculating execution time 
print("Time taken by NumPy Arrays :",
    (time.time() - initialTime),
Time taken by NumPy Arrays : 0.13434123992919922 seconds

example from geeksforgeeks.org

NumPy arrays

list

  • Sequential collection of items
  • Heterogeneous data types stored noncontiguously in memory

NumPy array*

  • Sequential collection of items - but can also be \(N\)-dimensional
  • Can be heterogeneous data, but more commonly used for homogeneous data types stored contiguously in memory
  • Many mathematical operations defined for them

*Note that NumPy arrays are not the same as arrays from the built-in array module.

figure source: geeksforgeeks.org

Why NumPy arrays

  1. Faster
    1. data retrieval: use of contiguous memory - less overhead on memory access
    2. reduced type-checking: don’t need to check types for every calculation
    3. call underlying C / Fortran functions that are ready to be executed
    4. can further optimize performance for same operation across data set (single instruction / multiple data)
  1. More readable (“vectorization” and “broadcasting”)
    1. no need to write loops - no looping variables; code is shorter
    2. code is closer to mathematical notation (involving vectors, matrices, arrays)*


*Note that the analogy of terminology in math is different from programming

  • 0D, 1D, 2D, ND arrays in mathematics are scalars, vectors, matrices, etc.
  • 0D arrays in NumPy are still NumPy arrays.




When to use lists instead of NumPy arrays

  • Elements have different data types
  • When the size is not know a priori. Growing a NumPy array is much slower than (need to find contiguous memory chunks).