# Exercise 7: Image Classification with Deep Learning

Last week we have seen the basics of PyTorch and its relationship to NumPy and implemented a linear regression model.

In this exercise, we will take a step further and implement a Convolutional Neural Network following the AlexNet architecture


<img src="https://www.researchgate.net/publication/320052364/figure/fig1/AS:543136445198336@1506505227088/Scheme-of-the-AlexNet-network-used.png">

[image from Llamas et al., (2017) availble by Creative Commons Attribution 4.0 International](https://www.researchgate.net/publication/320052364_Classification_of_Architectural_Heritage_Images_Using_Deep_Learning_Techniques)

We
1. build a torch dataset to lead the [UC Merced Dataset](http://weegee.vision.ucmerced.edu/datasets/landuse.html) data
2. implement common data augmentation strategies
3. build a CNN model and load pre-trained weights
4. evaluate a trained CNN model

## 1. Setup

### 1.1 Install dependencies

In [None]:
!pip install -U -q torch=2.5.0 torchvision matplotlib tqdm gdown

### 1.2 Check if GPU available

PyTorch has full support for GPUs and we will make use of it already in this exercise. So let's first test if you have GPU availability.

Run the following code block. It should be `False` when running locally (unless you have a GPU), and `True` on Colab

If you get `False` and you are in Colab, click on Runtime tab and select GPU hardware accelerator

In [None]:
import torch

print(torch.cuda.is_available())

## 2. Data Loading

Here, we get the dataset and write some code to make it accessible to our prospective DL model.

### 2.1 Download data

Let's download and unzip the [UC Merced Land Use dataset](http://weegee.vision.ucmerced.edu/datasets/UCMerced_LandUse.zip) for this exercise.

In [None]:
import urllib.request
import os
if not os.path.exists("UCMerced_LandUse.zip"):
    print(f"downloading UCMerced_LandUse.zip")
    urllib.request.urlretrieve("http://weegee.vision.ucmerced.edu/datasets/UCMerced_LandUse.zip", "UCMerced_LandUse.zip")

import zipfile
with zipfile.ZipFile("UCMerced_LandUse.zip", 'r') as zip_ref:
    zip_ref.extractall()

### 2.3 Write a PyTorch Dataset class

In previous cases, we were able to load all of our data into system memory to train machine learning models.
For DL, this does not work anymore, because we have to use potentially millions of images. Furthermore, since we are dealing with images, we have no clear definition of labels – for example, a model does not know what to do with the information that an image is called "mediumresidential00.jpg".

Hence, we need to implement a routine that (i.) keeps track of our many images and only loads them if needed, and (ii.) can prepare the dataset for usage in our prospective DL model.
In PyTorch, this is done through a [Dataset](https://pytorch.org/tutorials/recipes/recipes/custom_dataset_transforms_loader.html#create-a-dataset-class) class – if you remember last week's exercise you should know what object classes are.
Let us implement a Dataset class for our UCMerced dataset.

Please take a look on the link to understand how this works and complete the code shell below accordingly.

Tips:
*   Check which functions a `torch.utils.data.Dataset` class requires and implement them accordingly.
*   See last week's exercise on how to load images using PIL (the Python Image Library).
*   In the constructor you see an optional variable named `transforms`. This is for data augmentation, which we will address later. At this point all you need to know is that you can use this on the PIL images and it will return a `torch.Tensor` directly. So it suffices to use `data = self.transforms(img)`.

In [None]:
from torch.utils.data import Dataset

from PIL import Image

import os
import glob


class UCMerced(Dataset):

    # mapping between label class names and indices
    LABEL_CLASSES = {
      'agricultural': 		  0,
      'airplane': 			    1,
      'baseballdiamond': 	  2,
      'beach': 				      3,
      'buildings': 			    4,
      'chaparral': 			    5,
      'denseresidential':   6,
      'forest': 				    7,
      'freeway': 				    8,
      'golfcourse': 			  9,
      'harbor': 				    10,
      'intersection': 		  11,
      'mediumresidential':  12,
      'mobilehomepark': 	  13,
      'overpass': 			    14,
      'parkinglot': 			  15,
      'river': 				      16,
      'runway': 				    17,
      'sparseresidential':  18,
      'storagetanks': 		  19,
      'tenniscourt': 			  20
    }

    # image indices to use for different splits
    SPLITS = {
      'train': list(range(0, 60)),    # use first 60 images of each class for training...
      'val':   list(range(61, 70)),   # ...images 61-70 for model validation...
      'test':  list(range(71, 100))   # ...and the rest for testing
    }

    def __init__(self, transforms=None, split='train'):
        self.transforms = transforms

        # prepare data
        self.data = []  # list of tuples of (image path, label class)
        for labelclass in self.LABEL_CLASSES:
            # get images with correct index according to dataset split
            for imgIndex in self.SPLITS[split]:
                imgName = os.path.join('UCMerced_LandUse/Images', labelclass, f'{labelclass}{str(imgIndex).zfill(2)}.tif')
                # example format: 'baseFolder/agricultural/agricultural07.tif'
                self.data.append((
                    imgName,
                    self.LABEL_CLASSES[labelclass]          # get index for label class
                ))


    #TODO: please provide the remaining functions required for the torch.utils.data.Dataset class.
    def __len__(self):
        return #TODO


    def __getitem__(self, index):
        # TODO retrieve the item from self.data at position index
        imgName, label =

        # TODO load the image array from the imgName
        img =

        # TODO apply transformation
        if self.transforms is not None:
            img =
        return img, label


Nothing to do here, if your implementation works, you should be able to plot samples now

In [None]:
import matplotlib.pyplot as plt

# initialize the dataset (call the constructor __init__)
dataset = UCMerced()
print(f"dataset of length {len(dataset)}")

# plot individual samples
from ipywidgets import interact
@interact(idx=range(len(dataset)))
def plot_sample(idx=0):
    img, label = dataset[idx]

    plt.imshow(img)

    # swaps keys and values in the dictionary UCMerced.LABEL_CLASSES
    class_mapping = {v: k for k, v in UCMerced.LABEL_CLASSES.items()}

    plt.title(f"classid {label} ({class_mapping[label]})")

### 2.4 Define transforms

We have seen some `transform` operations above, e.g. `ToTensor()`. These are not only used to perform data conversion, but also data **augmentation**. This is important for DL as it artificially increases the dataset complexity. For example, randomly flipping images horizontally every now and then is a logical operation to perform on remote sensing images; doing it during training exposes the DL model to more variations.

In the following, we will define transforms for training and validation separately.

TODO: add the following augmentations using the [torchvision.transforms documentations](https://pytorch.org/vision/stable/transforms.html)
* RandomResizedCrop
* RandomGrayscale
* RandomHorizontalFlip
* GaussianBlur
* RandomPosterize
* RandomVerticalFlip
* ColorJitter

In [None]:
import torchvision.transforms as T
import numpy as np

# mean and standard deviation of the dataset
mean=torch.tensor([0.504, 0.504, 0.503])
std=torch.tensor([0.019 , 0.018, 0.018])

# normalize image [0-1] (or 0-255) to zero-mean unit standard deviation
normalize = T.Normalize(mean, std)
# we invert normalization for plotting later
std_inv = 1 / (std + 1e-7)
unnormalize = T.Normalize(-mean * std_inv, std_inv)

transforms_train = T.Compose([
  #TODO: add your own transforms here

  T.Resize((224, 224)),
  T.ToTensor(),
  normalize
])

# we do not augment the validation dataset (aside from resizing and tensor casting)
transforms_val = T.Compose([
  T.Resize((224, 224)),
  T.ToTensor(),
  normalize
])

test your transforms by executing the following cell

In [None]:
dataset_index = 500

img, label = dataset[dataset_index]

fig, axs = plt.subplots(1,2, figsize=(12,6))
axs[0].imshow(unnormalize(transforms_val(img)).permute(1,2,0))
axs[0].set_title("validation transform (no augmentation)")

axs[1].imshow(unnormalize(transforms_train(img)).permute(1,2,0))
axs[1].set_title("training transform")
[ax.axis("off") for ax in axs] # removes ticks

Lets add the transform function to the dataset rather than manually applying it every time

In [None]:
import matplotlib.pyplot as plt

# initialize the dataset (call the constructor __init__)
dataset = UCMerced()
print(f"dataset of length {len(dataset)}")

# plot individual samples
from ipywidgets import interact
@interact(idx=range(len(dataset)))
def plot_sample(idx=0):
    img, label = dataset[idx]

    plt.imshow(img)

    # swaps keys and values in the dictionary UCMerced.LABEL_CLASSES
    class_mapping = {v: k for k, v in UCMerced.LABEL_CLASSES.items()}

    plt.title(f"classid {label} ({class_mapping[label]})")

In [None]:
# TODO add your transforms function to the dataset as argument
train_dataset = UCMerced(#TODO)
val_dataset = UCMerced(#TODO)


# Nothing todo, plotting functions
fig, axs = plt.subplots(1,5, figsize=(5*3, 3))
for ax in axs:
    idx = np.random.randint(len(train_dataset)) # random sample
    image, label = train_dataset[idx]
    ax.imshow(unnormalize(image).permute(1,2,0))
    ax.set_title(f"idx {idx}, {list(UCMerced.LABEL_CLASSES.keys())[label]}")
    ax.axis("off")

fig.suptitle("training samples")
plt.tight_layout()

fig, axs = plt.subplots(1,5, figsize=(5*3, 3))
for ax in axs:
    idx = np.random.randint(len(val_dataset)) # random sample
    image, label = train_dataset[idx]
    ax.imshow(unnormalize(image).permute(1,2,0))
    ax.set_title(f"idx {idx}, {list(UCMerced.LABEL_CLASSES.keys())[label]}")
    ax.axis("off")

fig.suptitle("validation samples")
plt.tight_layout()

## 3. Create model

Lets design an deep learning model for classifying the UC Merced images. In this case, this will be a based on AlexNet.

### 3.1 Building blocks

get one val image with label


In [None]:
# TODO: load one val image with label and save into x and y
x, y =

# we add a 1-dimension to build a "batch" of size 1
x = x.unsqueeze(0)

fig, ax = plt.subplots(1,1)
ax.imshow(unnormalize(x.squeeze().detach()).permute(1,2,0).numpy(), cmap="gray")
ax.set_title("input x")

### 3.1.1 Convolution

<img src="https://upload.wikimedia.org/wikipedia/commons/1/19/2D_Convolution_Animation.gif">

animation from  [Wikipedia with Creative Commons Attribution-Share Alike 3.0 Unported License](https://commons.wikimedia.org/wiki/File:2D_Convolution_Animation.gif#filelinks)

**TODO** add a Conv layer following the [torch.nn.Conv2d documentation](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d)

In [None]:
from torch import nn

# TODO initialize a 5x5 conv layer with nn.Conv2d that contains 3 input dimensions and 1 output dimension,
# with stride 2 and padding 2
convlayer =

# TODO: make a forward pass throught the convolution function
x_convolved =

fig, ax = plt.subplots(1,1)
ax.imshow(x_convolved.detach().squeeze().numpy(), cmap="gray")
ax.set_title("conv(x)")


### 3.1.2 ReLU: Rectified linear unit activation function

$y = \max(x, 0)$

**TODO** add a ReLU layer following the [torch.nn.ReLU documentation](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)

In [None]:
# TODO: add a ReLU layer
relu =

# make forward pass through the relu function
x_relu = relu(x_convolved)

fig, ax = plt.subplots(1,1)
ax.imshow(x_relu.detach().squeeze().numpy(), cmap="gray")
ax.set_title("relu(conv(x))")


### 3.1.3 ReLU MaxPooling

<img src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png">

image showing 2x2 pooling with stride 2 [image from wikipedia with CC BY-SA 4.0](https://en.wikipedia.org/wiki/Convolutional_neural_network#/media/File:Max_pooling.png) license

**TODO** add a MaxPool2d layer following the [torch.nn.MaxPool2d documentation](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)

In [None]:
# TODO add a 3x3 maxpooling layer with stride 2
maxpool =

# make forward pass with x_relu through the maxpool layer
x_pool = maxpool(x_relu)

fig, ax = plt.subplots(1,1)
ax.imshow(x_pool.detach().squeeze().numpy(), cmap="gray")
ax.set_title("maxpool(relu(conv(x)))")

### 3.1.4 Dropout

in training mode (default)

In [None]:
# TODO: add a Dropout layer with 50% dropout probability
dropout =

#TODO take the original input x and apply dropout to it
x_dropout =

fig, ax = plt.subplots()
ax.imshow(unnormalize(x_dropout.squeeze().detach()).permute(1,2,0).numpy(), cmap="gray")

in evaluation mode

**TODO** set dropout to eval mode by calling the `.eval` function on it

In [None]:
# set dropout to eval mode
dropout.eval()

# call dropout again (same code as above)
x_dropout = dropout(x)

fig, ax = plt.subplots()
ax.imshow(unnormalize(x_dropout.squeeze().detach()).permute(1,2,0).numpy(), cmap="gray")

### 3.1.5 Adaptive Average Pooling


<img src="https://production-media.paperswithcode.com/methods/Screen_Shot_2020-06-06_at_12.15.58_PM.png">

image from [paperswithcode.com under CC-BY-SA License](https://paperswithcode.com/method/global-average-pooling)

**TODO** implement adaptive (global) average pooling following the [torch.nn.AdaptiveAvgPool2d documentation](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html) with output size 1x1

In [None]:
# TODO: implement AdaptiveAvgPool2d with output size 1x1
avgpool =

# TODO: apply adaptive average pooling on the input image x
x_pooled =

print(x_pooled.shape)
# remove (squeeze) the 1x1 dimensions ("-1" means last dimension)
x_pooled = x_pooled.squeeze(-1).squeeze(-1)

print(x_pooled.shape)

print(x_pooled.squeeze())

### 3.1.6 Linear (Dense) Layer

a dense layer is a linear transformation $y = Ax + b$

**TODO** Implement a linear (dense) layer with bias with 3 input dimensions and 8 output dimensions following [torch.nn.Linear documentation](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)



In [None]:
# TODO: imlement the Linear Layer
dense =

# TODO: call the linear layer on x_pooled
x_dense =

print(f"dense output {x_dense}")

# FYI:access the A matrix in the equation by
print(f"A matrix: {dense.weight} and bias {dense.bias}")

### 3.2 Build a complete AlexNet

Now we will use those building blocks in a CNN. To this end, PyTorch implements models again in a class, this time `torch.nn.Module`. In there, you define:
1. which layers your model has in the `__init__` function
2. in which order they are to be executed in the `forward` function.
3. we do not worry about training for now. We can save some memory by specifying that we do not need gradients with `@torch.no_grad()` later


Note that we do not implement exactly the model in the figure of the first cell. Instead, we stay with the established torchvision alexnet implementation.

**TODO** Implement the AlexNet

Hint:
* you can either implement it line-by-line following the instructions
* or search the AlexNet class in the torchvision [source code](https://github.com/pytorch/vision) and copy-paste the model definition

In [None]:
class AlexNet(nn.Module):
    def __init__(self, num_classes: int = 1000, dropout: float = 0.5) -> None:
        super().__init__()
        self.features = nn.Sequential(
            # TODO 11x11 convolution from 3 to 64 channels stride 4, padding 2
            # TODO inplace ReLU
            # TODO 3x3 Maxpool with stride 2
            # TODO 5x5 convolution from 64 to 192 channels and padding 2
            # TODO inplace ReLU
            # TODO 3x3 Maxpool with stride 2
            # TODO 3x3 convolution from 192 to 384 channels and padding 1
            # TODO inplace ReLU
            # TODO 3x3 convolution from 384 to 256 channels and padding 1
            # TODO inplace ReLU
            # TODO 3x3 convolution from 256 to 256 channels and padding 1
            # TODO inplace ReLU
            # TODO 3x3 Maxpool with stride 2
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            # TODO Dropout
            # TODO Linear with 256*6*6 to 4096 channels
            # TODO inplace ReLU
            # TODO Dropout
            # TODO Linear with 4096 to 4096 channels
            # TODO inplace ReLU
            # TODO Linear with 4096 to num_classes channels
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = # TODO apply features sub-module
        x = # TODO apply average pooling sub-module
        x = torch.flatten(x, 1)
        x = # TODO apply classifier sub.module
        return x


In [None]:
# Load the model :
model = AlexNet(num_classes=21)

# Perform a forward pass
logits = model(x)

## 4 Evaluation with pre-trained model

Download model weights `alexnet-epoch=1414-val_accuracy=0.71.pth` by running the next cell.


Alternatively, you can download it here:https://drive.google.com/uc?export=download&id=1S5bI-3cSdyUKw-oknoYI3KKXnK7e96rN

In [None]:
import gdown

url = "https://drive.google.com/uc?export=download&id=1S5bI-3cSdyUKw-oknoYI3KKXnK7e96rN"
gdown.download(url, output="alexnet-epoch=1414-val_accuracy=0.71.pth", quiet=True)

this should output `<All keys matched successfully>`

In [None]:
# TODO: load the model state dict with torch.load(<path>)
state_dict =

# TODO: update the model weights by calling load_state_dict on the model


### 4.1 Qualitative Evaluation

Let's check whether our model does the right thing. If we feed it an image of the right size it should return a tensor with n elements, where n equals the number of classes we have (21 in our case of the UC Merced land use dataset).

In [None]:
model = model.cpu()

# TODO: initialize the test_dataset with tansforms_val and test split
test_dataset =

# TODO take the first image in the dataset
img, label =
img = img.unsqueeze(0) # add 1-dimension (creates a batch of one sample)

# TODO predict the image with the model
logits =

print(f'logits tensor size: {logits.size()}')
print('logits tensor (model output) values:')
print(logits)

# TODO package the lines above into a function
@torch.no_grad()
def predict(dataset, index, model):
    """
    inputs:
        dataset: the test dataset,
        index: index of the sample in the test dataset,
        model: the model
    returns:
        image: the input image
        label: the input label
        logits: a vector of prediction logits
    """
    model.eval()

    return img, label, logits

In [None]:
# Observe the predictions :
test_dataset = UCMerced(transforms=transforms_val, split='test')

@interact(idx=range(len(test_dataset)))
def plot_logits(idx):
    img, label, prediction = predict(test_dataset, idx, model)

    fig, axs = plt.subplots(1,2, figsize=(12,3))
    axs[0].imshow(unnormalize(img).permute(1,2,0))
    axs[0].set_title(list(UCMerced.LABEL_CLASSES.keys())[label])
    axs[0].axis("off")

    ax = axs[1]
    ax.bar(np.arange(21), prediction[0].detach())
    ax.set_xticks(np.arange(21))
    ax.set_xticklabels(UCMerced.LABEL_CLASSES.keys(), rotation=90)
    ax.set_ylabel("logits")

## Normalize Logits to Probabilities with Softmax

usually, we are interested in output probabilities rather than outputs.

we normalize the logits $z = (z_0, z_i, \dots, z_C)$ to probabilities with the softmax function:

$\sigma(z)_i = \frac{\exp(z_i)}{\sum_j^K \exp(z_j)}$

hint:
* you can use `tensor.exp()` function for the exponential
* you can use `tensor.sum()` to sum over all elements

In [None]:
img, label, logits = predict(test_dataset, index=0, model=model)

# TODO implement the softmax function
def softmax(logits):
    val =
    return val

print("implementation check: should be all True (equal to torch softmax function)")
torch.isclose(softmax(logits), nn.functional.softmax(logits, dim=-1))

In [None]:
softmax(logits)

**TODO**: go back to the previous cells and modify the `predict` function to output probabilities rather than logits

### Quantitative Evaluation

TODO try changing the device now from `cpu` to `cuda` and run the code again.

Note, the difference is not extremely large, as we just predict, for training, GPU will make a massive difference

In [None]:
test_dataset = UCMerced(transforms=transforms_val, split='test')
from torch.utils.data import DataLoader
from tqdm.auto import tqdm
dataloader = DataLoader(test_dataset, batch_size=16)

# TODO change cpu to cuda and see if the prediction is faster
device =
model = model.to(device)

y_preds, y_trues = [], []
with torch.no_grad():
    for x,y in tqdm(dataloader, total=len(dataloader)):
        x = x.to(device)
        y = y.to(device)

        logits = model(x)
        y_pred = logits.argmax(1)
        y_preds.append(y_pred.cpu().numpy())
        y_trues.append(y.cpu().numpy())

y_preds = np.hstack(y_preds)
y_trues = np.hstack(y_trues)

In [None]:
# TODO calculate the overall accuracy
overall_accuracy =
print(overall_accuracy)

In [None]:
from sklearn.metrics import classification_report

# the names of all classes
target_names = list(UCMerced.LABEL_CLASSES.keys())

# TODO print the classification report
print(classification_report( #TODO  ))

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

display = ConfusionMatrixDisplay.from_predictions(y_preds, y_trues,
                      display_labels=classes, xticks_rotation=90, cmap="Blues")

display

## Appendix: Training the AlexNet (not included in the exercise)

With the next cells, you can train the Alexnet. This requires a GPU (or a lot of time) and is not part of this exercise.

This is for-your-interest only and we will talk about neural network training and optimization in detail in the next exercise. Note that in order to save some time, we're only training for 50 epochs here. The resulting model will thus not be as accurate as the one you loaded the pretrained weights into earlier. Feel free to change the amount of epochs if you'd like a better model.

In [None]:
!pip install -q -U pytorch_lightning tensorboard
import pytorch_lightning as pl
from pytorch_lightning import loggers as pl_loggers

from pytorch_lightning.callbacks import ModelCheckpoint

import torch.nn as nn
import torch.nn.functional as F

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs

In [None]:
class PLWrapper(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        self.log("train_loss", loss)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.model(x)
        loss = F.cross_entropy(y_hat, y)
        self.log("val_loss", loss)
        self.log("val_accuracy", (y_hat.argmax(1) == y).float().mean())

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=0.01, weight_decay=0.01)


plmodel = PLWrapper(model)

tb_logger = pl_loggers.TensorBoardLogger(save_dir="logs/")

checkpoint_callback = ModelCheckpoint(
    dirpath='checkpoints',
    filename='alexnet-{epoch}-{val_accuracy:.2f}',
    monitor="val_accuracy",
    mode="max"
    )

trainer = pl.Trainer(max_epochs=1500, accelerator="gpu", devices=[0],
                     logger=tb_logger, callbacks=[checkpoint_callback])
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=256, shuffle=True, num_workers=8)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=256, shuffle=False, num_workers=8)
trainer.fit(plmodel, train_dataloaders=train_loader, val_dataloaders=val_loader)

In [None]:
!ls checkpoints
checkpoint = torch.load("checkpoints/alexnet-epoch=1410-val_accuracy=0.75.ckpt")
from collections import OrderedDict
state_dict = OrderedDict({k.replace("model.",""):v for k,v in checkpoint["state_dict"].items()})
torch.save(state_dict,"alexnet-epoch=1410-val_accuracy=0.75.pth")