Automatic speech processing
EE-554
XTTS TTS Exercise
Description
The goal of this exercise is to get familiar with modern text-to-speech (TTS) technology and explore the strengths and limitations of a popular open-source TTS model. You will use the Coqui.ai TTS library, which provides pre-trained neural models that are easy to download and use with just a few lines of code. We will focus on Coqui's XTTS model, which is a multi-lingual TTS model trained on 16 languages. It also supports zero-shot voice cloning, meaning that the model can copy a person’s voice after listening to a short sample, even if it has never seen that voice before during training.
The exercise consists of a notebook which you can run on Google Colab.
You will test the model's various capabilities, analyze its strengths and weaknesses, and reason about model evaluation in each case. By the end of this exercise, you will have a better understanding of the current state of TTS technology and identify areas where models such as XTTS could be improved.