Poetry Generation Using Tensorflow, Keras, and LSTM

6 min readNov 21, 2020

Image by Evgeni Tcherkasski from Pixabay

Generation of texts is being used in movie scripts and code generation. It has a huge potential in real-worlds. It uses probabilistic prediction for the next word based on the data it is trained on. Text generation can be seen as time-series data generation because predicted words depend on the previously generated words.

For time-series data analysis LSTM is used. We will first cover the RNN and disadvantages of RNN, thereafter, we will see how LSTM overcomes the problem of RNN. Finally, I will show you line by line coding with an explanation. Leave comments if you have any doubts. Please do like this article if it helps you. Let’s get started.

What is RNN

Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it. Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Different types of RNN’s

Different types of Recurrent Neural Networks.

Image Classification
Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing a positive or negative sentiment).
Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).
Synced sequence input and output (e.g. video classification where we wish to label each frame of the video)

The Problem of RNN’s or Long-Term Dependencies

Vanishing Gradient
Exploding Gradient

Vanishing Gradient

If the partial derivation of Error is less than 1, then when it gets multiplied with the Learning rate which is also very less. then Multiplying the learning rate with the partial derivation of Error won't be a big change when compared with the previous iteration.

Exploding Gradient

We speak of Exploding Gradients when the algorithm assigns a stupidly high importance to the weights, without much reason. But fortunately, this problem can be easily solved if you truncate or squash the gradients

Long Short Term Memory (LSTM) Networks

Long Short Term Memory networks — usually just called “LSTMs” — are a special kind of RNN, capable of learning long-term dependencies.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

Sequence Generation Scheme

Let’s Code

import tensorflow as tf
import string
import requests
import pandas as pdresponse = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')print(response.text)
data = response.text.splitlines()
print('Length of data: ', len(data))

Build the LSTM Model and Prepare X and y

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

Let’s do the tokenization

token = Tokenizer()
token.fit_on_texts(data)

You can get the full help on tokenization by running

help(token)

Tokenized words can be seen by

token.word_index

Let’s encode tokenized words. It will convert text data to numerical tokens

encoded_text = token.texts_to_sequences(data)# vocabulary size should be + 1
vocab_size = len(token.word_counts) + 1

Prepare Training Data

Please watch the embedded video for a detailed description of these functions.

datalist = []
for d in encoded_text:
  if len(d)>1:
    for i in range(2, len(d)):
      datalist.append(d[:i])
      print(d[:i])

Padding

Padding will make sure all data points have the same length because text sentences could have variable lengths.

max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')X = sequences[:, :-1]
y = sequences[:, -1]y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

LSTM Model Training

I am gonna add two LSTM cells, each having 100 input units.

model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])#it will take sometime to complete traning
model.fit(X, y, batch_size=32, epochs=50)

Now we are done with model training. ML model is ready for poetry generation. Let’s do it.

Poetry Generation

poetry_length = 10
def generate_poetry(seed_text, n_lines):
  for i in range(n_lines):
    text = []
    for _ in range(poetry_length):
      encoded = token.texts_to_sequences([seed_text])
      encoded = pad_sequences(encoded, maxlen=seq_length, padding='pre')

      y_pred = np.argmax(model.predict(encoded), axis=-1)

      predicted_word = ""
      for word, index in token.word_index.items():
        if index == y_pred:
          predicted_word = word
          break

      seed_text = seed_text + ' ' + predicted_word
      text.append(predicted_word)

    seed_text = text[-1]
    text = ' '.join(text)
    print(text)

This function will take seed text and the number of lines we want to generate.

seed_text = 'i love you'
generate_poetry(seed_text, 5)

Congrats!!! You have successfully built a Poetry Generation ML Model using LSTM. You should get an output something like this

is no and i want to do is wash your
name i set fire to the beat tears are gonna
understand last night she let the sky fall when it
was just like a song i was so scared to
make us grow from the arms of your love to

If you want perfect poetry generation then you might need to use more data and training for better accuracy. But this output is not bad since we have a small dataset.

Poetry Generation Using Tensorflow, Keras, and LSTM

What is RNN

Different types of RNN’s

The Problem of RNN’s or Long-Term Dependencies

Vanishing Gradient

Exploding Gradient

Long Short Term Memory (LSTM) Networks

Sequence Generation Scheme

Let’s Code

Build the LSTM Model and Prepare X and y

Prepare Training Data

Padding

LSTM Model Training

Poetry Generation

Further Reading

Natural Language Processing (NLP) in Python for Beginners

Welcome to KGP Talkie's Natural Language Processing course. It is designed to give you a complete understanding of Text…

NLP: Natural Language Processing ML Model Deployment at AWS

Are you ready to kickstart your Advanced NLP course? Are you ready to deploy your machine learning models in production…

Data Visualization in Python Masterclass™: Beginners to Pro

Are you ready to start your path to becoming a Data Scientist! KGP Talkie brings you all in one course. Learn all kinds…

Written by Laxmi Kant | KGP Talkie