# 2️⃣ Welcome to the second exercise of the "Deep Learning for Natural Language Processing" course.

In this exercise, we will be training a binary sentiment classifier based on a Recurrent Neural Network (RNN) architecture.
There are three major themes:
1. Preprocessing the data: tokenization, stop word removal, lemmatization
2. Train a word embedding model for reuse in a Long Short-Term Memory (LSTM) module
3. Training, testing and improving an RNN

Please work through it until the exercise session on October 4th, where we will discuss a possible solution and our findings.

# Python3 environment requirements

This notebook was tested with Python3.10.13 and the following library versions:

* numpy==1.26.0
* matplotlib==3.8.0
* scikit-learn==1.3.0
* torch==2.0.1
* spacy==3.6.1
* tqdm==4.66.1
* gensim==4.3.2
* nltk==3.8.1

In [None]:
import torch
import random
import numpy as np

def seed(seed = 1810):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# set seed for reproducibility
SEED = 1810
seed(SEED)

# Preprocessing

Deep Learning models are known to require a limited amount of feature engineering, which makes them very versatile. This is especially true in many state-of-the-art NLP models.

However, sometimes, especially on low-resource tasks, it may be better to do some basic preprocessing to the text instead of passing the raw stream.

By doing so, we hard-code some linguistic knowledge that makes it easier for the model to train.

In this notebook we will once again leverage the reviews dataset used in the previous exercise.

As an initial step we load the dataset using the provided helper function.

In [None]:
from res.exercise2_util import load_dataset
reviews, labels = load_dataset()

we can have a look at one of the reviews and its corresponding label

In [None]:
review_index = 1810
print("[review text]:\n", reviews[review_index])
print("[review label]:\n", labels[review_index])

# Tokenization

In theory, we could feed the neural network model the input text one character at a time, and there are in fact models that do so and work well.
However, this approach ignores any prior linguistic knowledge, even the knowledge of what constitutes a word.

"Tokenization" means to split up the text into tokens, i.e., the smallest units to be considered by subsequent steps.

The nature of the text varies based on the domain/context of use. As a matter of fact, just think of the contrast in text between Wikipedia and Twitter. Luckily the Natural Language Toolkit ([_NLTK_](https://www.nltk.org/)) library contains several tools that can be very useful. One can use or define different tokenizers depending on the task at hand.

For instance we can use regular expressions to create a tokenizer that can identify words that may or may not contain an apostrophe, e.g., "didn't".

**Nota bene**: based on the task at hand, considering words with upper case or lower case letters can make a difference. In fact, word dictionaries can distinguish between "Apple" and "apple". In this notebook for simplicity we use only lower case letters.

In [None]:
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer("\w+\'?\w+|\w+")

def tokenize(text):
    return tokenizer.tokenize(str(text.lower())) # Nota bene: we enforce lower case letters

In [None]:
example = "You shouldn't watch this movie, it's just a waste of time."
print("[example raw]:\n", example)
example_tokenized = tokenize(example)
print("[example tokenized]:\n", example_tokenized)

# Stop words removal

For some applications, some words are not very useful for the task at hand because they provide little or no indication of the semantics of the task. These words are called stop words.

For (topical) text classification, these often include common words such as articles, conjunctions, and pronouns.

We once again can use [_NLTK_](https://www.nltk.org/) to retrieve a list of English stop words.

In [None]:
import nltk
nltk.download("stopwords")
from nltk.corpus import stopwords

stop_words = stopwords.words("english")

we can look at some of the stop words

In [None]:
print(stop_words[:50])

we can also check if our tokenized example contains any of those stop words

In [None]:
example_stop_words = [token for token in example_tokenized if token in stop_words]
print(example_stop_words)

we can also look at the example without the stop words

In [None]:
example_filtered = [token for token in example_tokenized if token not in stop_words]
print(example_filtered)

however, for sentiment classification many of the stop words can be useful because they can have a large impact on the actual sentiment expressed in the sentence (e.g., not). Hence, we don't want to exclude them.

In [None]:
from res.exercise2_util import retrieve_exception_stop_words
exception_stop_words = retrieve_exception_stop_words()

we can look at some of the exception stop words

In [None]:
print(exception_stop_words[:50])

we can also check if our tokenized example contains any of those exception stop words

In [None]:
example_exception_stop_words = [token for token in example_tokenized if token in exception_stop_words]
print(example_exception_stop_words)

we can also look at the example without the stop words but with the exception ones

In [None]:
example_filtered = [token for token in example_tokenized if (token in exception_stop_words) or (token not in stop_words)]
print(example_filtered)

We can also leverage other libraries to extend our list of stop words. Over the years [_spaCy_](https://spacy.io/) has become an essential tool in NLP.

We load its stop words, intersect them with the [_NLTK_](https://www.nltk.org/) ones and exclude the ones that we consider useful for our task.

We also provide you with a function that removes stop words from a tokenized text. 

In [None]:
from spacy.lang.en.stop_words import STOP_WORDS

stop_words = set(stop_words).union(STOP_WORDS)

final_stop_words = stop_words - set(exception_stop_words)

def remove_stop_words(text):
    return [token for token in text if token not in final_stop_words]

let's make sure that this function provides the same results on our tokenized example

In [None]:
remove_stop_words(example_tokenized)

# Lemmatization

Lemmatization is a technique to reduce the number of words considered by the model by reducing different inflections of a word to the root.

The [_spaCy_](https://spacy.io/) library provides convenient functionality for doing this.

We provide you a function to lemmatize a tokenized text.

In [None]:
!python -m spacy download en_core_web_sm

import spacy
nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])

def lemmatize(text):
    lemma_result = []
    for token in text:
        doc = nlp(token)
        for token in doc:
            lemma_result.append(token.lemma_)
    return lemma_result

we can see the result of the lemmatizer

In [None]:
lemmatize(["playing", "plays", "played"])

finally, we can preprocess a review as a pipeline of tokenization, stop word removal, and lemmatization.

In [None]:
def pipeline(review):
    review = tokenize(review)
    review = remove_stop_words(review)
    return lemmatize(review)

In [None]:
print("[review text]:\n", reviews[review_index])
print("[review pipeline]:\n", pipeline(reviews[review_index]))

since this step can take a lot of time we already did this for you. We will simply load the pipelined reviews from disk

In [None]:
from res.exercise2_util import retrieve_pipelined_reviews

pipelined_reviews = retrieve_pipelined_reviews()

assert pipeline(reviews[review_index]) == pipelined_reviews[review_index]

## Training a word2vec model

As you very well know, the neural networks we consider should not be fed with high-dimensional one-hot-vectors directly. Instead, it is better to map them into low-dimensional space first (word embeddings).

It is possible to just randomly initialize word embedding tables and train them from scratch jointly with the sentiment classification task. However, it is often better to initialize the word embedding tables with pretrained word embeddings and finetune them on the task afterwards.

It is common to use freely available word embeddings pretrained on large corpora, but here we train one from scratch.

**Nota bene (side note)**: Even if it's not the case in this assignment, when you're initializing your dictionary with pre-trained embeddings, you should use its corresponding tokenizer for your corpus. For example: FastText -> Europarl, BERT -> WordPiece

In the previous assignment we used the [_vecto_](http://vecto.space/) library. In this assignment we'll see how to use the [_gensim_](https://radimrehurek.com/gensim/) library.

**Nota bene**: depending on the resources, this can take some time to run, so please be patient.

In [None]:
from gensim.models import Word2Vec

word2vec_embedding_dim = 25

model = Word2Vec(pipelined_reviews,
                 vector_size = word2vec_embedding_dim,
                 window = 3,
                 min_count = 3,
                 seed = SEED,
                 workers = 1)

word_vectors = model.wv
word_vectors["[UNK]"] = np.random.rand(word2vec_embedding_dim)
del model

when inspecting the nearest neighbors of some of the words important to sentiment analysis, we can see that the word embeddings already capture some useful semantics

In [None]:
word_vectors.similar_by_word(word = "good", topn = 5)

In [None]:
word_vectors.similar_by_word(word = "bad", topn = 5)

In [None]:
word_vectors.similar_by_word(word = "school", topn = 5)

In [None]:
word_vectors.similar_by_word(word = "comedy", topn = 5)

In [None]:
word_vectors.similar_by_word(word = "action", topn = 5)

In [None]:
word_vectors.similar_by_word(word = "sad", topn = 5)

In [None]:
# calculate: (actor - man) + woman = ?
word_vectors.most_similar(positive = ['woman', 'actor'], negative = ['man'], topn = 5)

## Preparing data for neural networks

As previously stated, we want to reuse the trained embeddings in our network for later use.

We reuse the word -> index mapping from the Word2Vec model and turn them into PyTorch tensors.

In [None]:
# map all words in a review into a tensor of indices
def word2idx(embedding_model, review):
    index_review = []
    for word in review:
        try:
            index_review.append(embedding_model.key_to_index[word])
        except: 
            index_review.append(embedding_model.key_to_index["[UNK]"])
    return torch.tensor(index_review)

In [None]:
reviews_to_index = list(map(lambda review: word2idx(word_vectors, review), pipelined_reviews))

As usual, let's split the data into training, validation, and test sets.

In [None]:
from sklearn.model_selection import train_test_split

labels = [0 if label == 'negative' else 1 for label in labels]

X_train, X_test, y_train, y_test = train_test_split(reviews_to_index, labels, test_size = 0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.2)

print(len(X_train), len(X_test), len(X_val))
print(len(y_train), len(y_test), len(y_val))

## Dealing with batches

Reviews are different insofar as they are of varying length.

This makes processing data in batches relatively complicated. However, parallelization via batching is crucial to train the neural networks in reasonable time.

To cope with this problem, we need to pad all sentences in a batch to equal length.

Below, we pad all sequences to the length of the longest sequence, however, sometimes it is reasonable to put a hard cap on the sequence length to speed up training.

In [None]:
batch_size = 128

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def iterator_func(X, y):
    size = len(X)
    permutation = np.random.permutation(size)
    iterator = []
    for i in range(0, size, batch_size):
        indices = permutation[i : i + batch_size]
        
        batch = {}
        batch["text"] = [X[i] for i in indices]
        batch["label"] = [y[i] for i in indices]
        
        batch["text"], batch["label"] = zip(*sorted(zip(batch["text"], batch["label"]), key=lambda x: len(x[0]), reverse = True))
        batch["length"] = [len(review) for review in batch["text"]]
        batch["length"] = torch.IntTensor(batch["length"])
        batch["text"] = torch.nn.utils.rnn.pad_sequence(batch["text"], batch_first = True).t() # pads all sequences to max length
        batch["label"] = torch.Tensor(batch["label"])
        
        batch["label"]  = batch["label"].to(DEVICE)
        batch["length"] = batch["length"].to(DEVICE) 
        batch["text"]   = batch["text"].to(DEVICE) 
        
        iterator.append(batch)
        
    return iterator

train_iterator = iterator_func(X_train, y_train)
valid_iterator = iterator_func(X_val, y_val)
test_iterator = iterator_func(X_test, y_test)

# Building the RNN

Now we want to implement an RNN architecture to encode our reviews in one vector.

Since we are dealing with variable length sequences which have been padded to fit a batch, we cannot just take the output of the last hidden state as the representation of the sequence, because it will incorrectly consider the pad tokens as a valid input.

Therefore, we need to extract the output at the last _valid_ output, which will be used to represent the text. This representation is input to a logistic regression classifier to get the final binary output.

PyTorch provides a convenient functionality that makes dealing with variable length sequences easier and faster.

The code below makes use of the `PackedSequence` class, which can be passed directly to RNNs in one go. At the end, the 'hidden' variable will automatically contain the outputs at the last _valid_ timestep for each element of the batch.

In [None]:
import torch.nn as nn

In [None]:
class RNN(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, embedding_weights):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.embedding = nn.Embedding.from_pretrained(embedding_weights)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x, text_lengths):
        # x: [sentence length , batch size]
        embedded = self.embedding(x) #[sentence length, batch size, embedding dim]
        # note that nn.utils.rnn.pack_padded_sequence expects the lengths to be given on the CPU
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        packed_output, (hidden, cell) = self.rnn(packed_embedded) # output[sentence length, batch size, hidden dim*num of directions], [numberlayers*num of directions, batch size, hidden dim]
        return self.fc(hidden.squeeze(0))

# Training

Lastly, we need to train the model we just created.

Nothing new happens here - we just need to configure the model and training parameters, set up the training loop, and run the whole thing.

In [None]:
import torch.optim as optim

In [None]:
EMBEDDING_DIM = word2vec_embedding_dim
HIDDEN_DIM = 50
OUTPUT_DIM = 1
N_EPOCHS = 5

embedding_weights = torch.Tensor(word_vectors.vectors)

model = RNN(EMBEDDING_DIM,
            HIDDEN_DIM,
            OUTPUT_DIM,
            embedding_weights)

optimizer = optim.Adam(model.parameters(), lr = 0.001)

print(model)

In [None]:
criterion = nn.BCEWithLogitsLoss()
model = model.to(DEVICE)
criterion = criterion.to(DEVICE)

In [None]:
from tqdm import tqdm

def binary_accuracy(preds, y):
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float()
    acc = correct.sum() / len(correct)
    return acc

def train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in tqdm(iterator):
        optimizer.zero_grad()
        predictions = model(batch["text"], batch["length"]).squeeze(1)
        loss = criterion(predictions, batch["label"])
        acc = binary_accuracy(predictions, batch["label"])
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.eval()
    with torch.no_grad():
        for batch in tqdm(iterator):
            predictions = model(batch["text"],batch["length"]).squeeze(1)
            loss = criterion(predictions, batch["label"])
            acc = binary_accuracy(predictions, batch["label"])
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

**Nota bene**: depending on the resources, this can take some time to run, so please be patient.

In [None]:
for epoch in range(N_EPOCHS):
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    print(f'| Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}% |')

# Testing

In [None]:
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'| Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}% |')

In [None]:
def predict_sentiment(sentence):
    tokenized = pipeline(sentence)
    indexed = word2idx(word_vectors, tokenized)
    tensor = torch.LongTensor(indexed).to(DEVICE)
    tensor = tensor.unsqueeze(1)
    prediction = torch.sigmoid(model(tensor, torch.LongTensor([len(indexed)]).to(DEVICE)))
    return prediction.item()

In [None]:
predict_sentiment("This is an awesome movie.")

In [None]:
predict_sentiment("This is a good movie.")

In [None]:
predict_sentiment("This is an awful movie.")

In [None]:
predict_sentiment("This is a bad movie.")

In [None]:
predict_sentiment("Even if this is not an action movie, I still liked it quite a lot.")

In [None]:
predict_sentiment("Despite the terrible title, the bad credits section in the end, and the low-quality sounds here and there, this is a movie of extraordinary quality.")

# Exercise 1

In the model above, we used the hidden state at the last timestep to represent the whole sequence. But often this is not the best way to aggregate the outputs of an RNN. Average or max pooling often works better.

We ask you to implement and test average (mean) and max pooling aggregators and report the results you obtain.

You can reuse the `RNN` model definition from the **_Building the RNN_** section

In [None]:
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

seq = torch.FloatTensor([[1,2,0],
                         [3,0,0],
                         [4,5,6]])
lens = [2,
        1,
        3]

print("\nseq before pack:\n", seq)
print("\nlens before pack:\n", lens)

packed = pack_padded_sequence(seq, lens, enforce_sorted = False, batch_first = True)

seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first = True)

print("\nseq after unpack:\n", seq_unpacked)
print("\nlens after unpack:\n", lens_unpacked)

print("\nsum over sequence dimension\n", seq_unpacked.sum(dim = 1))

print("\nperform mean without considering the paddings\n", torch.mean(seq_unpacked, dim = 1))

print("\nperform mean considering the paddings\n", torch.sum(seq_unpacked, dim = 1) / lens_unpacked)

In [None]:
packed

In [None]:
class RNN_mean(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, embedding_weights):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.embedding = nn.Embedding.from_pretrained(embedding_weights)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x, text_lengths):
        # x: [sentence length , batch size]
        embedded = self.embedding(x) #[sentence length, batch size, embedding dim]
        # note that nn.utils.rnn.pack_padded_sequence expects the lengths to be given on the CPU
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        packed_output, (hidden, cell) = self.rnn(packed_embedded) # output[sentence length, batch size, hidden dim*num of directions], [numberlayers*num of directions, batch size, hidden dim]
        unpacked_output, unpacked_lengths = torch.nn.utils.rnn.pad_packed_sequence(packed_output)
        unpacked_output = unpacked_output.to(DEVICE)
        unpacked_lengths = unpacked_lengths.to(DEVICE)
        output = unpacked_output.sum(dim = 0) / unpacked_lengths.reshape(-1, 1)
        
        return self.fc(output)

In [None]:
model_mean = RNN_mean(EMBEDDING_DIM,
                      HIDDEN_DIM,
                      OUTPUT_DIM,
                      embedding_weights)

optimizer = optim.Adam(model_mean.parameters(), lr = 0.001)

criterion = nn.BCEWithLogitsLoss()
model_mean = model_mean.to(DEVICE)
criterion = criterion.to(DEVICE)

In [None]:
for epoch in range(N_EPOCHS):
    train_loss, train_acc = train(model_mean, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model_mean, valid_iterator, criterion)
    print(f'| Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}% |')

In [None]:
test_loss, test_acc = evaluate(model_mean, test_iterator, criterion)
print(f'| Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}% |')

In [None]:
class RNN_max(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, embedding_weights):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.embedding = nn.Embedding.from_pretrained(embedding_weights)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x, text_lengths):
        # x: [sentence length , batch size]
        embedded = self.embedding(x) #[sentence length, batch size, embedding dim]
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        packed_output, (hidden, cell) = self.rnn(packed_embedded) # output[sentence length, batch size, hidden dim*num of directions], [numberlayers*num of directions, batch size, hidden dim]
        
        unpacked_output, unpacked_lengths = torch.nn.utils.rnn.pad_packed_sequence(packed_output)
        unpacked_output = unpacked_output.to(DEVICE)
        
        mask = torch.where(unpacked_output == 0, True, False)
        
        output = unpacked_output.masked_fill_(mask, float("-inf")).max(dim = 0).values
        
        return self.fc(output)

In [None]:
model_max = RNN_max(EMBEDDING_DIM,
                    HIDDEN_DIM,
                    OUTPUT_DIM,
                    embedding_weights)

optimizer = optim.Adam(model_max.parameters(), lr = 0.001)

criterion = nn.BCEWithLogitsLoss()
model_max = model_max.to(DEVICE)
criterion = criterion.to(DEVICE)

In [None]:
for epoch in range(N_EPOCHS):
    train_loss, train_acc = train(model_max, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model_max, valid_iterator, criterion)
    print(f'| Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}% |')

In [None]:
test_loss, test_acc = evaluate(model_max, test_iterator, criterion)
print(f'| Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}% |')

# Exercise 2

We also change the LSTM architecture to be bidirectional or have multi-layers.

Try to change the previous LSTM architecture to BiLSTM with more layers, and see the impact.

Can you ontain better results? Discuss your findings.

In [None]:
class RNN_BiLSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, embedding_weights):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.embedding = nn.Embedding.from_pretrained(embedding_weights)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers = 2, bidirectional = True)
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        
    def forward(self, x, text_lengths):
        # x: [sentence length , batch size]
        embedded = self.embedding(x) #[sentence length, batch size, embedding dim]
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        packed_output, (hidden, cell) = self.rnn(packed_embedded) # output[sentence length, batch size, hidden dim*num of directions], [numberlayers*num of directions, batch size, hidden dim]
        
        unpacked_output, unpacked_lengths = torch.nn.utils.rnn.pad_packed_sequence(packed_output)
        unpacked_output = unpacked_output.to(DEVICE)
                    
        mask = torch.where(unpacked_output == 0, True, False)
        
        output = unpacked_output.masked_fill_(mask, float("-inf")).max(dim = 0).values
        
        return self.fc(output)

In [None]:
model_BiLSTM = RNN_BiLSTM(EMBEDDING_DIM,
                          HIDDEN_DIM,
                          OUTPUT_DIM,
                          embedding_weights)

optimizer = optim.Adam(model_BiLSTM.parameters(), lr = 0.001)

criterion = nn.BCEWithLogitsLoss()
model_BiLSTM = model_BiLSTM.to(DEVICE)
criterion = criterion.to(DEVICE)

In [None]:
for epoch in range(N_EPOCHS):
    train_loss, train_acc = train(model_BiLSTM, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model_BiLSTM, valid_iterator, criterion)
    print(f'| Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}% |')

In [None]:
test_loss, test_acc = evaluate(model_BiLSTM, test_iterator, criterion)
print(f'| Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}% |')