Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update non-API docs #2101

Merged
merged 14 commits into from
Jun 27, 2018
46 changes: 22 additions & 24 deletions docs/src/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,72 @@

.. _about:

============
=====
About
============
=====

History
--------
-------

Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library `dml.cz <http://dml.cz/>`_ in 2008,
where it served to generate a short list of the most similar articles to a given article (**gensim = "generate similar"**).
I also wanted to try these fancy "Latent Semantic Methods", but the libraries that
realized the necessary computation were `not much fun to work with <http://soi.stanford.edu/~rmunk/PROPACK/>`_.

Naturally, I set out to reinvent the wheel. Our `2010 LREC publication <http://radimrehurek.com/gensim/lrec2010_final.pdf>`_
describes the initial design decisions behind gensim (clarity, efficiency and scalability)
and is fairly representative of how gensim works even today.
describes the initial design decisions behind Gensim: clarity, efficiency and scalability. It is fairly representative of how Gensim works even today.

Later versions of gensim improved this efficiency and scalability tremendously. In fact,
I made algorithmic scalability of distributional semantics the topic of my `PhD thesis <http://radimrehurek.com/phd_rehurek.pdf>`_.

By now, gensim is---to my knowledge---the most robust, efficient and hassle-free piece
By now, Gensim is---to my knowledge---the most robust, efficient and hassle-free piece
of software to realize unsupervised semantic modelling from plain text. It stands
in contrast to brittle homework-assignment-implementations that do not scale on one hand,
and robust java-esque projects that take forever just to run "hello world".

In 2011, I started using `Github <https://github.com/piskvorky/gensim>`_ for source code hosting
and the gensim website moved to its present domain. In 2013, gensim got its current logo and website design.
and the Gensim website moved to its present domain. In 2013, Gensim got its current logo and website design.


Licensing
----------

Gensim is licensed under the OSI-approved `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_.
This means that it's free for both personal and commercial use, but if you make any
modification to gensim that you distribute to other people, you have to disclose
modification to Gensim that you distribute to other people, you have to disclose
the source code of these modifications.

Apart from that, you are free to redistribute gensim in any way you like, though you're
Apart from that, you are free to redistribute Gensim in any way you like, though you're
not allowed to modify its license (doh!).

My intent here is, of course, to **get more help and community involvement** with the development of gensim.
My intent here is to **get more help and community involvement** with the development of Gensim.
The legalese is therefore less important to me than your input and contributions.
Contact me if LGPL doesn't fit your bill but you'd still like to use gensim -- we'll work something out.

`Contact me <mailto:me@radimrehurek.com>`_ if LGPL doesn't fit your bill and you'd like the open source restrictions lifted.

.. seealso::

I also host a document similarity package `gensim.simserver`. This is a high-level
interface to `gensim` functionality, and offers transactional remote (web-based)
document similarity queries and indexing. It uses gensim to do the heavy lifting:
you don't need the `simserver` to use gensim, but you do need gensim to use the `simserver`.
Note that unlike gensim, `gensim.simserver` is licensed under `Affero GPL <http://www.gnu.org/licenses/agpl-3.0.html>`_,
which is much more restrictive for inclusion in commercial projects.
We also built a high performance commercial server for NLP, document analysis, indexing, search and clustering: https://scaletext.ai. ScaleText is available both on-prem and as SaaS.

Reach out at info@scaletext.com if you need an industry-grade NLP tool with professional support.


Contributors
--------------
------------

Credit goes to all the people who contributed to gensim, be it in `discussions <http://groups.google.com/group/gensim>`_,
Credit goes to the many people who contributed to Gensim, be it in `discussions <http://groups.google.com/group/gensim>`_,
ideas, `code contributions <https://github.com/piskvorky/gensim/pulls>`_ or `bug reports <https://github.com/piskvorky/gensim/issues>`_.

It's really useful and motivating to get feedback, in any shape or form, so big thanks to you all!

Some honorable mentions are included in the `CHANGELOG.txt <https://github.com/piskvorky/gensim/blob/develop/CHANGELOG.md>`_.

Academic citing
----------------
---------------

Gensim has been used in `many students' final theses as well as research papers <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C>`_. When citing gensim,
please use `this BibTeX entry <bibtex_gensim.bib>`_::
Gensim has been used in `over a thousand research paper and student theses <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C>`_.

When citing Gensim, please use `this BibTeX entry <bibtex_gensim.bib>`_::

@inproceedings{rehurek_lrec,
title = {{Software Framework for Topic Modelling with Large Corpora}},
Expand All @@ -83,5 +83,3 @@ please use `this BibTeX entry <bibtex_gensim.bib>`_::
note={\url{http://is.muni.cz/publication/884893/en}},
language={English}
}


8 changes: 4 additions & 4 deletions docs/src/distributed.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _distributed:

Distributed Computing
===================================
=====================

Why distributed computing?
---------------------------
Expand Down Expand Up @@ -42,15 +42,15 @@ installation is quite painless and only involves copying its `*.py` files somewh

sudo easy_install Pyro4

You don't have to install `Pyro` to run `gensim`, but if you don't, you won't be able
You don't have to install Pyro to run Gensim, but if you don't, you won't be able
to access the distributed features (i.e., everything will always run in serial mode,
the examples on this page don't apply).


Core concepts
-----------------------------------
-------------

As always, `gensim` strives for a clear and straightforward API (see :ref:`design`).
As always, Gensim strives for a clear and straightforward API (see :ref:`design`).
To this end, *you do not need to make any changes in your code at all* in order to
run it over a cluster of computers!

Expand Down
62 changes: 28 additions & 34 deletions docs/src/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ Quick install

Run in your terminal::

easy_install -U gensim
pip install --upgrade gensim

or, alternatively::

pip install --upgrade gensim
easy_install -U gensim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy_install no more relevant (all using pip or conda now), better to replace easy_install with conda install -c conda-forge gensim.
In other places - use pip always please

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


In case that fails, make sure you're installing into a writeable location (or use `sudo`), or read on.

Expand All @@ -28,9 +28,6 @@ platform that supports Python 2.6+ and NumPy. Gensim depends on the following so
* `NumPy <http://www.numpy.org>`_ >= 1.3. Tested with version 1.9.0, 1.7.1, 1.7.0, 1.6.2, 1.6.1rc2, 1.5.0rc1, 1.4.0, 1.3.0, 1.3.0rc2.
* `SciPy <http://www.scipy.org>`_ >= 0.7. Tested with version 0.14.0, 0.12.0, 0.11.0, 0.10.1, 0.9.0, 0.8.0, 0.8.0b1, 0.7.1, 0.7.0.

**Windows users** are well advised to try the `Enthought distribution <http://www.enthought.com/products/epd.php>`_,
which conveniently includes Python & NumPy & SciPy in a single bundle, and is free for academic use.


Install Python and `easy_install`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this section

---------------------------------
Expand All @@ -50,20 +47,19 @@ Install SciPy & NumPy
----------------------

These are quite popular Python packages, so chances are there are pre-built binary
distributions available for your platform. You can try installing from source using easy_install::
distributions available for your platform. You can try installing from source using `pip` or `easy_install`::

easy_install numpy
easy_install scipy
easy_install install numpy
easy_install install scipy

If that doesn't work or if you'd rather install using a binary package, consult
http://www.scipy.org/Download.
If that doesn't work or if you'd rather install using a binary package, consult http://www.scipy.org/Download.

Install `gensim`
-----------------
Install Gensim
--------------

You can now install (or upgrade) `gensim` with::
You can now install (or upgrade) Gensim with::

easy_install --upgrade gensim
easy_install -U gensim

That's it! Congratulations, you can proceed to the :doc:`tutorials <tutorial>`.

Expand All @@ -74,53 +70,51 @@ of computers, in :doc:`distributed`, you should install with::

easy_install gensim[distributed]

The optional `distributed` feature installs `Pyro (PYthon Remote Objects) <http://pypi.python.org/pypi/Pyro>`_.
If you don't know what distributed computing means, you can ignore it:
`gensim` will work fine for you anyway.
The optional ``distributed`` feature installs `Pyro (PYthon Remote Objects) <http://pypi.python.org/pypi/Pyro>`_.
If you don't know what distributed computing means, you can ignore it: Gensim will work fine for you anyway.

This optional extension can also be installed separately later with::

easy_install Pyro4
pip install Pyro4

-----

There are also alternative routes to install:

1. If you have downloaded and unzipped the `tar.gz source <http://pypi.python.org/pypi/gensim>`_
for `gensim` (or you're installing `gensim` from `github <https://github.com/piskvorky/gensim/>`_),
for Gensim (or you're installing Gensim from `Github <https://github.com/piskvorky/gensim/>`_),
you can run::

python setup.py install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install .


to install `gensim` into your ``site-packages`` folder.
2. If you wish to make local changes to the `gensim` code (`gensim` is, after all, a
package which targets research prototyping and modifications), a preferred
way may be installing with::
to install Gensim into your ``site-packages`` folder.
2. If you wish to make local changes to the Gensim code, a preferred way may be installing with::

python setup.py develop

or::

pip install -e .

This will only place a symlink into your ``site-packages`` directory. The actual
files will stay wherever you unpacked them.
3. If you don't have root priviledges (or just don't want to put the package into
your ``site-packages``), simply unpack the source package somewhere and that's it! No
compilation or installation needed. Just don't forget to set your PYTHONPATH
(or modify ``sys.path``), so that Python can find the unpacked package when importing.


Testing `gensim`
----------------
Testing Gensim
--------------

To test the package, unzip the `tar.gz source <http://pypi.python.org/pypi/gensim>`_ and run::

python setup.py test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox -e {py27,py35,py36}-{win,linux}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox -e <PYTHON_VERSION>-<OS_VERSION>, for example, if I want to run tests on Linux with python3.6, I should run tox -e py36-win


Gensim uses Travis CI for continuous integration: |Travis|_
Gensim uses Travis CI for continuous integration, automatically running the full test suite on each pull request and commit: |Travis|_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition


.. |Travis| image:: https://api.travis-ci.org/piskvorky/gensim.png?branch=develop
.. _Travis: https://travis-ci.org/piskvorky/gensim
.. |Travis| image:: https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop
.. _Travis: https://travis-ci.org/RaRe-Technologies/gensim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add (convert to from makdown first of course)

[![Conda-forge Build](https://anaconda.org/conda-forge/gensim/badges/version.svg)](https://anaconda.org/conda-forge/gensim)
[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/gensim)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's really needed or relevant here.



Problems?
---------

Use the `gensim discussion group <http://groups.google.com/group/gensim/>`_ for
questions and troubleshooting. See the :doc:`support page <support>`.
Use the `Gensim discussion group <http://groups.google.com/group/gensim/>`_ for
questions and troubleshooting. See the :doc:`support page <support>` for commercial support.
Loading