LSTM-RNN-in-numpy

Implementation of long-short term memory recurrent neural network using only numpy. This is heavily influenced by Andrej Karpathy's min_char model, though with many more features.

To directly get started, you can use the pretrained model "model_LSTM_h200_T50_D22", which was trained over 20,000 iterations at sequence length 50, hidden dimension of 200, input dimension of 1/4 vocab_size, LSTM cells, and minibatch size 16. We also have its associated solver function, in case you want to train more, or train on a new dataset. To import it into a python session, run the commands

import pickle filehandler = open(filename, 'r') model = pickle.load(filehandler)

To generate sentences, you must first supply an input character. To write a sequence, call the method model.sample(char_to_idx[char], T=seq_length), where char is an input character like 'a' and seq_length is the length of the desired sequence. To change this from a print statement to an output, you must change the method under Rnns.py, under the class RNN

The directory layout is as follows: The file main.ipynb is a Jupyter notebook showing the training of a single model. This is the main working file Rnns.py contains the RNN class, which instantiates either an LSTM recurrent neural network or a vanilla recurrent neural network solver.py contains the solver class, which takes as arguments an RNN class and data, and trains the RNN.

The following modules are called by the previous two: layers.py -> contains all of the forward and backward pass functions. The backpropagation gradients are checked numerically via the num_grad function defiend in the file "gradient_checking.py". We can run a full suite of tests on all layer functions by running -> python3 test_gradient_checking.py. An output file showing all of our checks is in the file test_gradient_output.txt

optim.py contains the Optimizer class. This keeps track of moving averages in various update rules. It's methods update parameters via gradient descent provided we supply them with a parameter dictionary (form the model class) and a gradient dictionary. These dictionaries MUST use the keys.

To test our model, we train the RNN on the full text of Jane Austen's Pride and Prejudice, availble for free on the web from gutenberg books. A zip file is contained in pride.zip, as well as text file pride.txt consiting of one long string.

To train a model, open the jupyter notebook. Change the ellipse under open( ... , 'r') to the name of your file, and run the cell. This will extract all unique characters from the text, create a charcter to integer dictionary (and vice versa), then map the text into an array of such integers. This is the default data input to the solver class.

In general, we may choose any input to the solver class, provided it is in the form of an array of integers in the range 0 -> vocab_dim-1. The model has a parameter to reduce the input dimension from vocab_dim to input_dim, representing each one-hot-encoded character as a vector. In this way, we could put in any sequence for the RNN to solve.

After running the initial cell, we isntantiate our model, choose 'lstm' or 'vanilla', and set the size of the input dimensions and hidden layer. We then instantiate the solver class with the learning rate, sequence length, and batch size. The solver hidden method "minibatch_generator" automatically converts the array into mini batches of size batch_size containing sequences of length seq_length.

After training, you can plot the loss function vs. epochs via the method model.plot_loss(), as well as look at some sample sequences generated via model.sample(seed_integer).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
RNNs.py		RNNs.py
Untitled.ipynb		Untitled.ipynb
austen.txt		austen.txt
gradient_checking.py		gradient_checking.py
layers.py		layers.py
main.ipynb		main.ipynb
model_LSTM_h100_h100_T80_D50.txt		model_LSTM_h100_h100_T80_D50.txt
model_LSTM_h200_T50_D22.txt		model_LSTM_h200_T50_D22.txt
optim.py		optim.py
pride.txt		pride.txt
pride_full.zip		pride_full.zip
solver.py		solver.py
solver_LSTM_h100_h100_T80_D50.txt		solver_LSTM_h100_h100_T80_D50.txt
solver_LSTM_h200_T50_D22.txt		solver_LSTM_h200_T50_D22.txt
test_grad_output.txt		test_grad_output.txt
test_gradient_checking.py		test_gradient_checking.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSTM-RNN-in-numpy

About

Releases

Packages

Languages

License

viix-co/LSTM-RNN-in-numpy

Folders and files

Latest commit

History

Repository files navigation

LSTM-RNN-in-numpy

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages