Text Generation with Markovify
A very simple way to train a model to generate random sentences given a corpus of text is to use a Markov Chain. In python, we can use markovify
to build such models.
pip install markovify
Assuming that the training corpus is a collection of files, we first create a Markov Chain for each file as follows:
import os
import markovify
PATH = '...'
chains = []
for filename in os.listdir(PATH):
content = open(f'{PATH}{filename}', 'r').readlines()
markov_chain = markovify.Text(content)
chains.append(markov_chain)
Then we combine the different models into one larger Markov Chain as follows:
model = markovify.combine(chains)
Finally, we can start generating random text:
model.make_sentence()