Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update non-API docs #2101

Merged
merged 14 commits into from
Jun 27, 2018
46 changes: 22 additions & 24 deletions docs/src/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,72 @@

.. _about:

============
=====
About
============
=====

History
--------
-------

Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library `dml.cz <http://dml.cz/>`_ in 2008,
where it served to generate a short list of the most similar articles to a given article (**gensim = "generate similar"**).
I also wanted to try these fancy "Latent Semantic Methods", but the libraries that
realized the necessary computation were `not much fun to work with <http://soi.stanford.edu/~rmunk/PROPACK/>`_.

Naturally, I set out to reinvent the wheel. Our `2010 LREC publication <http://radimrehurek.com/gensim/lrec2010_final.pdf>`_
describes the initial design decisions behind gensim (clarity, efficiency and scalability)
and is fairly representative of how gensim works even today.
describes the initial design decisions behind Gensim: clarity, efficiency and scalability. It is fairly representative of how Gensim works even today.

Later versions of gensim improved this efficiency and scalability tremendously. In fact,
I made algorithmic scalability of distributional semantics the topic of my `PhD thesis <http://radimrehurek.com/phd_rehurek.pdf>`_.

By now, gensim is---to my knowledge---the most robust, efficient and hassle-free piece
By now, Gensim is---to my knowledge---the most robust, efficient and hassle-free piece
of software to realize unsupervised semantic modelling from plain text. It stands
in contrast to brittle homework-assignment-implementations that do not scale on one hand,
and robust java-esque projects that take forever just to run "hello world".

In 2011, I started using `Github <https://github.com/piskvorky/gensim>`_ for source code hosting
and the gensim website moved to its present domain. In 2013, gensim got its current logo and website design.
and the Gensim website moved to its present domain. In 2013, Gensim got its current logo and website design.


Licensing
----------

Gensim is licensed under the OSI-approved `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_.
This means that it's free for both personal and commercial use, but if you make any
modification to gensim that you distribute to other people, you have to disclose
modification to Gensim that you distribute to other people, you have to disclose
the source code of these modifications.

Apart from that, you are free to redistribute gensim in any way you like, though you're
Apart from that, you are free to redistribute Gensim in any way you like, though you're
not allowed to modify its license (doh!).

My intent here is, of course, to **get more help and community involvement** with the development of gensim.
My intent here is to **get more help and community involvement** with the development of Gensim.
The legalese is therefore less important to me than your input and contributions.
Contact me if LGPL doesn't fit your bill but you'd still like to use gensim -- we'll work something out.

`Contact me <mailto:me@radimrehurek.com>`_ if LGPL doesn't fit your bill and you'd like the open source restrictions lifted.

.. seealso::

I also host a document similarity package `gensim.simserver`. This is a high-level
interface to `gensim` functionality, and offers transactional remote (web-based)
document similarity queries and indexing. It uses gensim to do the heavy lifting:
you don't need the `simserver` to use gensim, but you do need gensim to use the `simserver`.
Note that unlike gensim, `gensim.simserver` is licensed under `Affero GPL <http://www.gnu.org/licenses/agpl-3.0.html>`_,
which is much more restrictive for inclusion in commercial projects.
We also built a high performance commercial server for NLP, document analysis, indexing, search and clustering: https://scaletext.ai. ScaleText is available both on-prem and as SaaS.

Reach out at info@scaletext.com if you need an industry-grade NLP tool with professional support.


Contributors
--------------
------------

Credit goes to all the people who contributed to gensim, be it in `discussions <http://groups.google.com/group/gensim>`_,
Credit goes to the many people who contributed to Gensim, be it in `discussions <http://groups.google.com/group/gensim>`_,
ideas, `code contributions <https://github.com/piskvorky/gensim/pulls>`_ or `bug reports <https://github.com/piskvorky/gensim/issues>`_.

It's really useful and motivating to get feedback, in any shape or form, so big thanks to you all!

Some honorable mentions are included in the `CHANGELOG.txt <https://github.com/piskvorky/gensim/blob/develop/CHANGELOG.md>`_.

Academic citing
----------------
---------------

Gensim has been used in `many students' final theses as well as research papers <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C>`_. When citing gensim,
please use `this BibTeX entry <bibtex_gensim.bib>`_::
Gensim has been used in `over a thousand research paper and student theses <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C>`_.

When citing Gensim, please use `this BibTeX entry <bibtex_gensim.bib>`_::

@inproceedings{rehurek_lrec,
title = {{Software Framework for Topic Modelling with Large Corpora}},
Expand All @@ -83,5 +83,3 @@ please use `this BibTeX entry <bibtex_gensim.bib>`_::
note={\url{http://is.muni.cz/publication/884893/en}},
language={English}
}


12 changes: 6 additions & 6 deletions docs/src/distributed.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _distributed:

Distributed Computing
===================================
=====================

Why distributed computing?
---------------------------
Expand Down Expand Up @@ -37,20 +37,20 @@ Prerequisites

For communication between nodes, `gensim` uses `Pyro (PYthon Remote Objects)
<http://pypi.python.org/pypi/Pyro4>`_, version >= 4.27. This is a library for low-level socket communication
and remote procedure calls (RPC) in Python. `Pyro` is a pure-Python library, so its
and remote procedure calls (RPC) in Python. `Pyro4` is a pure-Python library, so its
installation is quite painless and only involves copying its `*.py` files somewhere onto your Python's import path::

sudo easy_install Pyro4
pip install Pyro4

You don't have to install `Pyro` to run `gensim`, but if you don't, you won't be able
You don't have to install Pyro to run Gensim, but if you don't, you won't be able
to access the distributed features (i.e., everything will always run in serial mode,
the examples on this page don't apply).


Core concepts
-----------------------------------
-------------

As always, `gensim` strives for a clear and straightforward API (see :ref:`design`).
As always, Gensim strives for a clear and straightforward API (see :ref:`design`).
To this end, *you do not need to make any changes in your code at all* in order to
run it over a cluster of computers!

Expand Down
2 changes: 1 addition & 1 deletion docs/src/gensim_theme/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ <h3>Get Expert Help From The Gensim Authors</h3>

<div class="tweetodsazeni">
<div class="tweet">
<a href="https://twitter.com/radimrehurek" target="_blank" style="color: white">Tweet @RadimRehurek</a>
<a href="https://twitter.com/gensim_py" target="_blank" style="color: white">Tweet @Gensim_py</a>
</div>
</div>

Expand Down
127 changes: 33 additions & 94 deletions docs/src/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,120 +7,59 @@ Installation
Quick install
--------------

Run in your terminal::

easy_install -U gensim

or, alternatively::
Run in your terminal (recommended)::

pip install --upgrade gensim

In case that fails, make sure you're installing into a writeable location (or use `sudo`), or read on.

-----

Dependencies
-------------
Gensim is known to run on Linux, Windows and Mac OS X and should run on any other
platform that supports Python 2.6+ and NumPy. Gensim depends on the following software:

* `Python <http://www.python.org>`_ >= 2.6. Tested with versions 2.6, 2.7, 3.3, 3.4 and 3.5. Support for Python 2.5 was discontinued starting gensim 0.10.0; if you *must* use Python 2.5, install gensim 0.9.1.
* `NumPy <http://www.numpy.org>`_ >= 1.3. Tested with version 1.9.0, 1.7.1, 1.7.0, 1.6.2, 1.6.1rc2, 1.5.0rc1, 1.4.0, 1.3.0, 1.3.0rc2.
* `SciPy <http://www.scipy.org>`_ >= 0.7. Tested with version 0.14.0, 0.12.0, 0.11.0, 0.10.1, 0.9.0, 0.8.0, 0.8.0b1, 0.7.1, 0.7.0.

**Windows users** are well advised to try the `Enthought distribution <http://www.enthought.com/products/epd.php>`_,
which conveniently includes Python & NumPy & SciPy in a single bundle, and is free for academic use.


Install Python and `easy_install`
---------------------------------

Check what version of Python you have with::

python --version

You can download Python from http://python.org/download.

.. note:: Gensim requires Python 2.6 / 3.3 or greater, and will not run under earlier versions.

Next, install the `easy_install utility <http://pypi.python.org/pypi/setuptools>`_,
which will make installing other Python programs easier.

Install SciPy & NumPy
----------------------

These are quite popular Python packages, so chances are there are pre-built binary
distributions available for your platform. You can try installing from source using easy_install::
or, alternatively for `conda` environments::

easy_install numpy
easy_install scipy

If that doesn't work or if you'd rather install using a binary package, consult
http://www.scipy.org/Download.

Install `gensim`
-----------------

You can now install (or upgrade) `gensim` with::

easy_install --upgrade gensim
conda install -c conda-forge gensim

That's it! Congratulations, you can proceed to the :doc:`tutorials <tutorial>`.

-----

If you also want to run the algorithms over a cluster
of computers, in :doc:`distributed`, you should install with::

easy_install gensim[distributed]

The optional `distributed` feature installs `Pyro (PYthon Remote Objects) <http://pypi.python.org/pypi/Pyro>`_.
If you don't know what distributed computing means, you can ignore it:
`gensim` will work fine for you anyway.
This optional extension can also be installed separately later with::

easy_install Pyro4
In case that failed, make sure you're installing into a writeable location (or use `sudo`).

-----

There are also alternative routes to install:

1. If you have downloaded and unzipped the `tar.gz source <http://pypi.python.org/pypi/gensim>`_
for `gensim` (or you're installing `gensim` from `github <https://github.com/piskvorky/gensim/>`_),
you can run::

python setup.py install

to install `gensim` into your ``site-packages`` folder.
2. If you wish to make local changes to the `gensim` code (`gensim` is, after all, a
package which targets research prototyping and modifications), a preferred
way may be installing with::
Code dependencies
-----------------

python setup.py develop
Gensim runs on Linux, Windows and Mac OS X, and should run on any other
platform that supports Python 2.7+ and NumPy. Gensim depends on the following software:

This will only place a symlink into your ``site-packages`` directory. The actual
files will stay wherever you unpacked them.
3. If you don't have root priviledges (or just don't want to put the package into
your ``site-packages``), simply unpack the source package somewhere and that's it! No
compilation or installation needed. Just don't forget to set your PYTHONPATH
(or modify ``sys.path``), so that Python can find the unpacked package when importing.
* `Python <http://www.python.org>`_ >= 2.7 (tested with versions 2.7, 3.5 and 3.6)
* `NumPy <http://www.numpy.org>`_ >= 1.11.3
* `SciPy <http://www.scipy.org>`_ >= 0.18.1
* `Six <https://pypi.org/project/six/>`_ >= 1.5.0
* `smart_open <https://pypi.org/project/smart_open/>`_ >= 1.2.1

Testing Gensim
--------------

Testing `gensim`
----------------
Gensim uses continuous integration, automatically running a full test suite on each pull request with

To test the package, unzip the `tar.gz source <http://pypi.python.org/pypi/gensim>`_ and run::
+------------+-----------------------------------------------------------------------------------------+--------------+
| CI service | Task | Build badge |
+============+=========================================================================================+==============+
| Travis | Run tests on Linux and check `code-style <https://www.python.org/dev/peps/pep-0008/?>`_ | |Travis|_ |
+------------+-----------------------------------------------------------------------------------------+--------------+
| AppVeyor | Run tests on Windows | |AppVeyor|_ |
+------------+-----------------------------------------------------------------------------------------+--------------+
| CircleCI | Build documentation | |CircleCI|_ |
+------------+-----------------------------------------------------------------------------------------+--------------+

python setup.py test
.. |Travis| image:: https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop
.. _Travis: https://travis-ci.org/RaRe-Technologies/gensim

Gensim uses Travis CI for continuous integration: |Travis|_
.. |CircleCI| image:: https://circleci.com/gh/RaRe-Technologies/gensim/tree/develop.svg?style=shield
.. _CircleCI: https://circleci.com/gh/RaRe-Technologies/gensim

.. |Travis| image:: https://api.travis-ci.org/piskvorky/gensim.png?branch=develop
.. _Travis: https://travis-ci.org/piskvorky/gensim
.. |AppVeyor| image:: https://ci.appveyor.com/api/projects/status/r2au32ucpn8gr0tl/branch/develop?svg=true
.. _AppVeyor: https://ci.appveyor.com/api/projects/status/r2au32ucpn8gr0tl/branch/develop?svg=true


Problems?
---------

Use the `gensim discussion group <http://groups.google.com/group/gensim/>`_ for
questions and troubleshooting. See the :doc:`support page <support>`.
Use the `Gensim discussion group <http://groups.google.com/group/gensim/>`_ for
questions and troubleshooting. See the :doc:`support page <support>` for commercial support.
Loading