English Text to speech with TensorFlowTTS

TensorFlowTTS is a Speech Synthesis library for Tensorflow 2, it can be used to generate speech in many languages including: English, French, Korean, Chinese, German. This library can also be easily adapted to generate speech in other languages.

In this tip, we will use TensorFlowTTS to generate english speech from a random text

First, we need to install the library

$ pip install git+https://github.com/TensorSpeech/TensorFlowTTS.git
$ pip install git+https://github.com/repodiac/german_transliterate.git#egg=german_transliterate

Then, we import the needed packages

import tensorflow as tf

import yaml
import numpy as np

import IPython.display as ipd

from transformers import pipeline

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import AutoProcessor

Now, we load the pretrained model for the English language which was trained on LJSpeech corpus.

tacotron2 = TFAutoModel.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en", name="tacotron2")
melgan = TFAutoModel.from_pretrained("tensorspeech/tts-melgan-ljspeech-en", name="melgan")

We also, need to instantiate the inference model that will process the text

processor = AutoProcessor.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en")

To simplify the generation of speech, we will define the following helper function which will call perform inference

def text2speech(input_text, text2mel_model, vocoder_model):
    input_ids = processor.text_to_sequence(input_text)
    # text2mel part
    _, mel_outputs, stop_token_prediction, alignment_history = text2mel_model.inference(
        tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
        tf.convert_to_tensor([len(input_ids)], tf.int32),
        tf.convert_to_tensor([0], dtype=tf.int32)
        )
    # vocoder part
    audio = vocoder_model(mel_outputs)[0, :, 0]
    return mel_outputs.numpy(), alignment_history.numpy(), audio.numpy()

Finally we call the helper function on a random text to generate the corresponding speech:

story = 'This story is called Breathe in Breathe out-of-Flight, a novel by George Kraszewski'.
mels, alignment_history, audios = text2speech(story, tacotron2, melgan)

ipd.Audio(audios, rate=22050)

Here are more inference examples with each model at notebooks. To convert the model to TF Lite format see colab. For language specific examples, see colab (for English), colab (for Korean), colab (for Chinese), colab (for French), colab (for German).

Related tips

Track your TF model GPU memory consumption during training

English Text to speech with TensorFlowTTS

Improve read performance with TFRecordDataset