Skip to content

Commit

Permalink
Merge branch 'release/0.3.2'
Browse files Browse the repository at this point in the history
  • Loading branch information
aolieman committed Jun 9, 2019
2 parents 5530b6e + a15e6f8 commit 3a3ef62
Show file tree
Hide file tree
Showing 21 changed files with 690 additions and 80 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ dist/
wayward.egg-info/
.mypy_cache/
.pytest_cache/
docs/_build
docs/_static
docs/_templates

.dmypy.json
.python-version
23 changes: 23 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# Build documentation with MkDocs
#mkdocs:
# configuration: mkdocs.yml

# Optionally build your docs in additional formats such as PDF and ePub
formats: all

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/requirements.txt
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

-

## [0.3.2] - 2019-06-09

### Added

- Package documentation:
- Transclude basic instructions from README.
- Generate API documentation.
- Configuration for Read the Docs.
- Incorporate changelog via symlink.
- Add a Dickens example page.
- Docs build status and PyPI version badges in README.

## [0.3.1] - 2019-06-05

### Added
Expand Down
77 changes: 49 additions & 28 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,28 @@
Wayward
=======

.. image:: https://readthedocs.org/projects/wayward/badge/?version=latest
:target: https://wayward.readthedocs.io/en/latest/?badge=latest
:alt: Documentation status

.. image:: https://badge.fury.io/py/wayward.svg
:target: https://pypi.org/project/wayward/
:alt: PyPI package version


.. docs-inclusion-marker
**Wayward** is a Python package that helps to identify characteristic terms from
single documents or groups of documents. It can be used to create word clouds.
single documents or groups of documents. It can be used for keyword extraction
and several related tasks, and can create efficient sparse representations for
classifiers. It was originally created to provide term weights for word clouds.

Rather than use simple term frequency, it weighs terms by statistical models
known as *parsimonious language models*. These models are good at picking up
the terms that distinguish a text document from other documents in a
collection.
Rather than use simple term frequency to estimate the importance of words and
phrases, it weighs terms by statistical models known as *parsimonious language
models*. These models are good at picking up the terms that distinguish a text
document from other documents in a collection.

For this to work, a preferably large amount of documents are needed
For this to work, a preferably large amount of documents is needed
to serve as a background collection, to compare the documents of interest to.
This could be a random sample of newspaper articles, for instance, but for many
applications it works better to take a natural collection, such as a periodical
Expand All @@ -29,14 +42,14 @@ Installation

Either install the latest release from PyPI::

pip install wayward
$ pip install wayward

or clone the git repository, and use `Poetry <https://poetry.eustace.io/docs/>`_
to install the package in editable mode::

git clone https://github.com/aolieman/wayward.git
cd wayward/
poetry install
$ git clone https://github.com/aolieman/wayward.git
$ cd wayward/
$ poetry install

Usage
-----
Expand All @@ -53,7 +66,7 @@ Usage

The ``ParsimoniousLM`` is initialized with all document tokens as a
background corpus, and subsequently takes a single document's tokens
as input. Its ``top`` method returns the top terms and their probabilities:
as input. Its ``top()`` method returns the top terms and their probabilities:

>>> from wayward import ParsimoniousLM
>>> plm = ParsimoniousLM(doc_tokens, w=.1)
Expand All @@ -75,23 +88,26 @@ method returns the top terms and their probabilities:

>>> from wayward import SignificantWordsLM
>>> swlm = SignificantWordsLM(doc_tokens, lambdas=(.7, .1, .2))
>>> swlm.group_top(10, doc_tokens[-3:])
[('in', 0.37875318027881),
('is', 0.07195732361699828),
('mortal', 0.07195732361699828),
('nature', 0.07195732361699828),
('all', 0.07110584778711342),
('we', 0.03597866180849914),
('true', 0.03597866180849914),
('lovers', 0.03597866180849914),
('strange', 0.03597866180849914),
('capers', 0.03597866180849914)]

See ``example/dickens.py`` for a running example with more realistic data.

Background
----------
This package started out as `WeighWords <https://github.com/larsmans/weighwords/>`_,
>>> swlm.group_top(10, doc_tokens[-2:], fix_lambdas=True)
[('much', 0.09077675276900632),
('lover', 0.06298706244865138),
('will', 0.06298706244865138),
('you', 0.04538837638450315),
('your', 0.04538837638450315),
('rhymes', 0.04538837638450315),
('speak', 0.04538837638450315),
('neither', 0.04538837638450315),
('rhyme', 0.04538837638450315),
('nor', 0.04538837638450315)]

See |example/dickens.py|_ for a runnable example with more realistic data.

.. |example/dickens.py| replace:: ``example/dickens.py``
.. _example/dickens.py: https://github.com/aolieman/wayward/blob/master/example/dickens.py

Origin and Relaunch
-------------------
This package started out as WeighWords_,
written by Lars Buitinck at the University of Amsterdam. It provides an efficient
parsimonious LM implementation, and a very accessible API.

Expand All @@ -104,6 +120,11 @@ the background collection. The parsimonization algorithm discounts terms that ar
already well explained by the background model, until the most wayward terms come
out on top.

See the Changelog_ for an overview of the most important changes.

.. _WeighWords: https://github.com/larsmans/weighwords/
.. _Changelog: https://wayward.readthedocs.io/en/develop/changelog.html

References
----------
D. Hiemstra, S. Robertson, and H. Zaragoza (2004). `Parsimonious Language Models
Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
7 changes: 7 additions & 0 deletions docs/api_docs/wayward.logsum.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
logsum module
=====================

.. automodule:: wayward.logsum
:members:
:undoc-members:
:show-inheritance:
6 changes: 6 additions & 0 deletions docs/api_docs/wayward.parsimonious.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
parsimonious module
===========================

.. automodule:: wayward.parsimonious
:members:
:show-inheritance:
6 changes: 6 additions & 0 deletions docs/api_docs/wayward.significant_words.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
significant\_words module
=================================

.. automodule:: wayward.significant_words
:members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/api_docs/wayward.specific_term_estimators.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
specific\_term\_estimators module
=========================================

.. automodule:: wayward.specific_term_estimators
:members:
:undoc-members:
:show-inheritance:
:exclude-members: logger
1 change: 1 addition & 0 deletions docs/changelog.md
67 changes: 67 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------
import pkg_resources

project = 'Wayward'
copyright = (
'2019, TinQwise Stamkracht, University of Amsterdam'
)
author = 'Alex Olieman'

# The full version, including alpha/beta/rc tags
release = pkg_resources.get_distribution('wayward').version

master_doc = 'index'

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'recommonmark',
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_show_sourcelink = True

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

html_theme_options = {
'collapse_navigation': False,
}
Loading

0 comments on commit 3a3ef62

Please sign in to comment.