# Exercise Session 1 – Jupyter Notebook tutorial
### ENV–501 Material and Energy Flow Analysis

September 12, 2024


#### Content of this tutorial:

1. [Introduction to Jupyter notebook](#notebook)
2. [An informal introduction to Python](#python)
3. [Data handling with Pandas](#pandas)

#### Sources:
- Jupyter Notebook documentation: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html
- Jupyter Notebook tutorial: https://www.dataquest.io/blog/jupyter-notebook-tutorial/
- Basic Python tutorial: https://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator
- Pandas tutorial: https://www.w3schools.com/python/pandas/default.asp


<a id='notebook'></a>
## Introduction to Jupyter Notebook

### What is a “notebook”?

A notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media.

In other words: it’s a single document where you can run code, display the output, and also add explanations, formulas, charts, and make your work more transparent, understandable, repeatable, and shareable.

### Kernel and Cell
There are two main concepts that should be clarified:

- A **kernel** is a “computational engine” that executes the code contained in a notebook document.
- A **cell** is a container for text to be displayed in the notebook or code to be executed by the notebook’s kernel.

### Types of cells
There are two main cell types that we will use:

- A **code cell** contains code to be executed in the kernel. When the code is run, the notebook displays the output below the code cell that generated it.
- A **Markdown cell** contains text formatted using Markdown and displays its output in-place when the Markdown cell is run.

Markdown is a lightweight, easy to learn markup language for formatting plain text.

###  Keyboard shortcuts

In a Jupyter Notebook, there is always one “active” cell highlighted with a border whose color denotes its current mode:

- Green outline — cell is in “edit mode” and you can type into the cells like a normal text editor
- Blue outline — cell is in “command mode” and you can edit the notebook as a whole, but not type into individual cells

You can change between edit and command mode with Enter and Esc, respectively.
Below some shortcuts in command mode:
- Basic navigation: k (move up), j (move down)
- Saving the notebook: s
- Change Cell types: y (code cell), m (markdown cell)
- Cell creation: a (add cell above), b (add cell below)
- Cell editing: x (remove cell), c (copy cell), v (cell paste), z (undo delete cell)
- Kernel operations: 0 (press twice for restart kernel)

Below shortcuts in edit mode:
- run cell: shift-enter

<a id='python'></a>
## An informal introduction to Python

### Numbers

The interpreter acts as a simple calculator: you can type an expression at it and it will write the value.

In [None]:
3 + 1

4

In [None]:
# define some variables and perform some operations
a = 7
b = 2

print('sum:', a + b)
print('product:', a * b)
print('exponential:', a ** b)
print('division:', a / b)
print('floor division:', a // b)
print('modulus:', a % b)

sum: 9
product: 14
exponential: 49
division: 3.5
floor division: 3
modulus: 1


### Strings

Besides numbers, Python can also manipulate strings, which can be expressed as "..." or '...'.

In [None]:
# define your string
mystring = "My first string"

In [None]:
# string indexing
print(mystring[4])

i


In [None]:
# string slicing
print(mystring[0:2])

My


### Lists

Python knows a number of compound data types, used to group together other values.

The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

Python lists are 0-indexed.

In [None]:
# define your list
mylist = [1,2,3,4,2,1]
mylist

[1, 2, 3, 4, 2, 1]

In [None]:
# list indexing
print(mylist[2])

3


In [None]:
# list slicing
print(mylist[2:])

[3, 4, 2, 1]


In [None]:
# change an element of a list
mylist[1] = 0
mylist

[1, 0, 3, 4, 2, 1]

In [None]:
# join or concatenate two lists
list1 = [1,4,9]
list2 = [0,2]

list3 = list1 + list2
print(list3)

[1, 4, 9, 0, 2]


In [None]:
# append an element to a list
list1 = [1,4,9]

list1.append(18)
print(list1)

[1, 4, 9, 18]


### Sets, tuples and dictionaries

Together with list there are other 3 built-in data types in Python used to store collections of data.

- A **sets** is a collection which is unordered, unchangeable, and unindexed.
- A **tuple** is a collection which is ordered and unchangeable.
- A **dictionary** is used to store data values in key:value pairs. It is ordered, changeable and do not allow duplicates.



In [None]:
# set
myset = {1,2,3}
myset

{1, 2, 3}

In [None]:
# tuple
mytuple = (1,2,3)
mytuple

(1, 2, 3)

In [None]:
# dictionary
mydict = {2:4, 'key2':3, 'key3':10}
mydict

{2: 4, 'key2': 3, 'key3': 10}

In [None]:
# call an element of a dictionary using the key
mydict['key2']

3

<a id='pandas'></a>
## Pandas

Pandas is a Python library used for working with datasets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Source tutorial: https://www.w3schools.com/python/pandas/default.asp

if you have not installed pandas yet, you should enterthe command “pip install pandas” on the terminal or "conda install pandas" if you are using Anaconda Navigator.

In [None]:
# import the library
import pandas as pd

In [None]:
# create sample dataset
df = pd.util.testing.makeMixedDataFrame()
df

  import pandas.util.testing


Unnamed: 0,A,B,C,D
0,0.0,0.0,foo1,2009-01-01
1,1.0,1.0,foo2,2009-01-02
2,2.0,0.0,foo3,2009-01-05
3,3.0,1.0,foo4,2009-01-06
4,4.0,0.0,foo5,2009-01-07


In [None]:
# show dataframe shape
df.shape

(5, 4)

In [None]:
# column data types
df.dtypes

A           float64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [None]:
# show basic dataframe info
df.describe()

Unnamed: 0,A,B
count,5.0,5.0
mean,2.0,0.4
std,1.581139,0.547723
min,0.0,0.0
25%,1.0,0.0
50%,2.0,0.0
75%,3.0,1.0
max,4.0,1.0


In [None]:
# print dataframe index
print(df.index)

RangeIndex(start=0, stop=5, step=1)


In [None]:
# print dataframe columns
print(df.columns)

Index(['A', 'B', 'C', 'D'], dtype='object')


In [None]:
# change name to the index or columns
df.index = ['I1', 'I2', 'I3', 'I4', 'I5']
df.columns = ['COL1', 'COL2', 'COL3', 'COL4']
df

Unnamed: 0,COL1,COL2,COL3,COL4
I1,0.0,0.0,foo1,2009-01-01
I2,1.0,1.0,foo2,2009-01-02
I3,2.0,0.0,foo3,2009-01-05
I4,3.0,1.0,foo4,2009-01-06
I5,4.0,0.0,foo5,2009-01-07


In [None]:
# A DataFrame is like a table with rows and columns.
# Pandas use the loc attribute to return one or more specified elements of a dataframe
print(df.loc['I2','COL3'])

# otherwise use iloc for indexing using the corresponsng row and column numbers
print(df.iloc[0,3])

foo2
2009-01-01 00:00:00


In [None]:
# you can also call an entire column
print(df.loc[:,'COL1'])

I1    0.0
I2    1.0
I3    2.0
I4    3.0
I5    4.0
Name: COL1, dtype: float64


In [None]:
# create a new column combining other columns
df.loc[:,'COL5'] = df.loc[:,'COL1'] + df.loc[:,'COL2']
df

Unnamed: 0,COL1,COL2,COL3,COL4,COL5
I1,0.0,0.0,foo1,2009-01-01,0.0
I2,1.0,1.0,foo2,2009-01-02,2.0
I3,2.0,0.0,foo3,2009-01-05,2.0
I4,3.0,1.0,foo4,2009-01-06,4.0
I5,4.0,0.0,foo5,2009-01-07,4.0


In [None]:
# dataframes can be also initialized starting from dictionaries
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
        'Age':  [20, 21, 19, 18],
       }

# create DataFrame from a dictionary
df = pd.DataFrame(data)

# print the output
df

Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


In [None]:
# simple problem: find the color of the box with the largest volume

# load dataframe from an excel file
df_boxes = pd.read_excel('data_tutorial.xlsx')
df_boxes



Unnamed: 0,color,width,length,height
0,blue,2,12,22
1,red,5,3,2
2,yellow,1,11,3
3,yellow,6,5,3
4,red,2,33,3
5,black,3,9,12
6,orange,2,12,8


In [None]:
# calculate the volume of the boxes using the width, length and height columns
df_boxes.loc[:,'volume'] = df_boxes.loc[:,'width'] * df_boxes.loc[:,'length'] * df_boxes.loc[:,'height']
df_boxes

Unnamed: 0,color,width,length,height,volume
0,blue,2,12,22,528
1,red,5,3,2,30
2,yellow,1,11,3,33
3,yellow,6,5,3,90
4,red,2,33,3,198
5,black,3,9,12,324
6,orange,2,12,8,192


In [None]:
# index of the box with highest volumne
index_max_vol = df_boxes.loc[:,'volume'].idxmax()

# pick the corresponding color
color = df_boxes.loc[index_max_vol,'color']
print(f'the {color} box has the largest volume')

the blue box has the largest volume
