Passage

A little library for text analysis with RNNs.

Warning: very alpha, work in progress.

Install

via Github (version under active development)

git clone http://github.com/IndicoDataSolutions/passage.git
python setup.py develop

or via pip

sudo pip install passage

Example

Using Passage to do binary classification of text, this example:

Tokenizes some training text, converting it to a format Passage can use.
Defines the model's structure as a list of layers.
Creates the model with that structure and a cost to be optimized.
Trains the model for one iteration over the training text.
Uses the model and tokenizer to predict on new text.
Saves and loads the model.

from passage.preprocessing import Tokenizer
from passage.layers import Embedding, GatedRecurrent, Dense
from passage.models import RNN
from passage.utils import save, load

tokenizer = Tokenizer()
train_tokens = tokenizer.fit_transform(train_text)

layers = [
	Embedding(size=128, n_features=tokenizer.n_features),
	GatedRecurrent(size=128),
	Dense(size=1, activation='sigmoid')
]

model = RNN(layers=layers, cost='BinaryCrossEntropy')
model.fit(train_tokens, train_labels)

model.predict(tokenizer.transform(test_text))
save(model, 'save_test.pkl')
model = load('save_test.pkl')

Where:

train_text is a list of strings ['hello world', 'foo bar']
train_labels is a list of labels [0, 1]
test_text is another list of strings

Datasets

Without sizeable datasets RNNs have difficulty achieving results better than traditional sparse linear models. Below are a few datasets that are appropriately sized, useful for experimentation. Hopefully this list will grow over time, please feel free to propose new datasets for inclusion through either an issue or a pull request.

Note: None of these datasets were created by indico, nor should their inclusion here indicate any kind of endorsement

Blogger Dataset: http://www.cs.biu.ac.il/~koppel/blogs/blogs.zip (Age and gender data)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
examples		examples
passage		passage
.gitignore		.gitignore
AUTHORS.txt		AUTHORS.txt
CHANGES.txt		CHANGES.txt
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Passage

Install

Example

Datasets

About

Releases

Packages

Contributors 6

Languages

License

IndicoDataSolutions/Passage

Folders and files

Latest commit

History

Repository files navigation

Passage

Install

Example

Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages