diff --git a/docs/notebooks/Corpora_and_Vector_Spaces.ipynb b/docs/notebooks/Corpora_and_Vector_Spaces.ipynb index bccfa9cc91..6c09bcf052 100644 --- a/docs/notebooks/Corpora_and_Vector_Spaces.ipynb +++ b/docs/notebooks/Corpora_and_Vector_Spaces.ipynb @@ -609,7 +609,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For a complete reference (want to prune the dictionary to a smaller size? Optimize converting between corpora and NumPy/SciPy arrays?), see the [API documentation](https://radimrehurek.com/gensim/apiref.html). Or continue to the next tutorial on Topics and Transformations ([notebook](https://github.com/piskvorky/gensim/tree/develop/docs/notebooks/Topics_and_Transformations.ipynb) \n", + "For a complete reference (want to prune the dictionary to a smaller size? Optimize converting between corpora and NumPy/SciPy arrays?), see the [API documentation](https://radimrehurek.com/gensim/apiref.html). Or continue to the next tutorial on Topics and Transformations ([notebook](Topics_and_Transformations.ipynb) \n", "or [website](https://radimrehurek.com/gensim/tut2.html))." ] } diff --git a/docs/notebooks/FastText_Tutorial.ipynb b/docs/notebooks/FastText_Tutorial.ipynb index 96a977ab0e..7b98dffc97 100644 --- a/docs/notebooks/FastText_Tutorial.ipynb +++ b/docs/notebooks/FastText_Tutorial.ipynb @@ -21,7 +21,7 @@ "## When to use FastText?\n", "The main principle behind FastText is that the morphological structure of a word carries important information about the meaning of the word, which is not taken into account by traditional word embeddings, which train a unique word embedding for every individual word. This is especially significant for morphologically rich languages (German, Turkish) in which a single word can have a large number of morphological forms, each of which might occur rarely, thus making it hard to train good word embeddings. \n", "FastText attempts to solve this by treating each word as the aggregation of its subwords. For the sake of simplicity and language-independence, subwords are taken to the character ngrams of the word. The vector for a word is simply taken to be the sum of all vectors of its component char-ngrams. \n", - "According to a detailed comparison of Word2Vec and FastText in [this notebook](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Word2Vec_FastText_Comparison.ipynb), FastText does significantly better on syntactic tasks as compared to the original Word2Vec, especially when the size of the training corpus is small. Word2Vec slightly outperforms FastText on semantic tasks though. The differences grow smaller as the size of training corpus increases. \n", + "According to a detailed comparison of Word2Vec and FastText in [this notebook](Word2Vec_FastText_Comparison.ipynb), FastText does significantly better on syntactic tasks as compared to the original Word2Vec, especially when the size of the training corpus is small. Word2Vec slightly outperforms FastText on semantic tasks though. The differences grow smaller as the size of training corpus increases. \n", "Training time for FastText is significantly higher than the Gensim version of Word2Vec (`15min 42s` vs `6min 42s` on text8, 17 mil tokens, 5 epochs, and a vector size of 100). \n", "FastText can be used to obtain vectors for out-of-vocabulary (oov) words, by summing up vectors for its component char-ngrams, provided at least one of the char-ngrams was present in the training data." ] @@ -314,7 +314,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Syntactically similar words generally have high similarity in FastText models, since a large number of the component char-ngrams will be the same. As a result, FastText generally does better at syntactic tasks than Word2Vec. A detailed comparison is provided [here](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Word2Vec_FastText_Comparison.ipynb).\n", + "Syntactically similar words generally have high similarity in FastText models, since a large number of the component char-ngrams will be the same. As a result, FastText generally does better at syntactic tasks than Word2Vec. A detailed comparison is provided [here](Word2Vec_FastText_Comparison.ipynb).\n", "\n", "Other similarity operations -" ] diff --git a/docs/notebooks/WordRank_wrapper_quickstart.ipynb b/docs/notebooks/WordRank_wrapper_quickstart.ipynb index dfea2d81ad..f830e71506 100644 --- a/docs/notebooks/WordRank_wrapper_quickstart.ipynb +++ b/docs/notebooks/WordRank_wrapper_quickstart.ipynb @@ -8,7 +8,7 @@ "source": [ "# WordRank wrapper tutorial on Lee Corpus\n", "\n", - "WordRank is a new word embedding algorithm which captures the semantic similarities in a text data well. See this [notebook](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Wordrank_comparisons.ipynb) for it's comparisons to other popular embedding models. This tutorial will serve as a guide to use the WordRank wrapper in gensim. You need to install [WordRank](https://bitbucket.org/shihaoji/wordrank) before proceeding with this tutorial.\n", + "WordRank is a new word embedding algorithm which captures the semantic similarities in a text data well. See this [notebook](Wordrank_comparisons.ipynb) for it's comparisons to other popular embedding models. This tutorial will serve as a guide to use the WordRank wrapper in gensim. You need to install [WordRank](https://bitbucket.org/shihaoji/wordrank) before proceeding with this tutorial.\n", "\n", "\n", "# Train model\n", diff --git a/docs/notebooks/Wordrank_comparisons.ipynb b/docs/notebooks/Wordrank_comparisons.ipynb index 61ddf99756..7bb7fd22c6 100644 --- a/docs/notebooks/Wordrank_comparisons.ipynb +++ b/docs/notebooks/Wordrank_comparisons.ipynb @@ -1200,7 +1200,7 @@ "source": [ "# References\n", "1. [WordRank: Learning Word Embeddings via Robust Ranking](https://arxiv.org/pdf/1506.02761v3.pdf)\n", - "2. [Word2Vec and FastText comparison notebook](https://github.com/jayantj/gensim/blob/9f3e275ddad22afd54b7986654f3033f9baf8983/docs/notebooks/Word2Vec_FastText_Comparison.ipynb)\n", + "2. [Word2Vec and FastText comparison notebook](Word2Vec_FastText_Comparison.ipynb)\n", "3. [Similarity test data](https://www.cl.cam.ac.uk/~fh295/simlex.html)" ] } diff --git a/docs/notebooks/annoytutorial-text8.ipynb b/docs/notebooks/annoytutorial-text8.ipynb index 61fa6a8508..b151a9b742 100644 --- a/docs/notebooks/annoytutorial-text8.ipynb +++ b/docs/notebooks/annoytutorial-text8.ipynb @@ -165,7 +165,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "See the [Word2Vec tutorial](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) for how to initialize and save this model." + "See the [Word2Vec tutorial](word2vec.ipynb) for how to initialize and save this model." ] }, { diff --git a/docs/notebooks/annoytutorial.ipynb b/docs/notebooks/annoytutorial.ipynb index 61fa6a8508..b151a9b742 100644 --- a/docs/notebooks/annoytutorial.ipynb +++ b/docs/notebooks/annoytutorial.ipynb @@ -165,7 +165,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "See the [Word2Vec tutorial](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) for how to initialize and save this model." + "See the [Word2Vec tutorial](word2vec.ipynb) for how to initialize and save this model." ] }, { diff --git a/docs/notebooks/atmodel_tutorial.ipynb b/docs/notebooks/atmodel_tutorial.ipynb index 5be0c259a0..80bb993537 100644 --- a/docs/notebooks/atmodel_tutorial.ipynb +++ b/docs/notebooks/atmodel_tutorial.ipynb @@ -16,7 +16,7 @@ "* Gentle introduction to the LDA model: http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/\n", "* Gensim's LDA API documentation: https://radimrehurek.com/gensim/models/ldamodel.html\n", "* Topic modelling in Gensim: http://radimrehurek.com/topic_modeling_tutorial/2%20-%20Topic%20Modeling.html\n", - "* Pre-processing and training LDA: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/lda_training_tips.ipynb\n", + "* [Pre-processing and training LDA](lda_training_tips.ipynb)\n", "\n", "\n", "> **NOTE:**\n", @@ -33,7 +33,7 @@ "\n", "## Analyzing scientific papers\n", "\n", - "The data we will be using consists of scientific papers about machine learning, from the Neural Information Processing Systems conference (NIPS). It is the same dataset used in the [Pre-processing and training LDA](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/lda_training_tips.ipynb) tutorial, mentioned earlier.\n", + "The data we will be using consists of scientific papers about machine learning, from the Neural Information Processing Systems conference (NIPS). It is the same dataset used in the [Pre-processing and training LDA](lda_training_tips.ipynb) tutorial, mentioned earlier.\n", "\n", "We will be performing qualitative analysis of the model, and at times this will require an understanding of the subject matter of the data. If you try running this tutorial on your own, consider applying it on a dataset with subject matter that you are familiar with. For example, try one of the [StackExchange datadump datasets](https://archive.org/details/stackexchange).\n", "\n", diff --git a/docs/notebooks/doc2vec-lee.ipynb b/docs/notebooks/doc2vec-lee.ipynb index 92d01aa133..21c17a51d3 100644 --- a/docs/notebooks/doc2vec-lee.ipynb +++ b/docs/notebooks/doc2vec-lee.ipynb @@ -52,7 +52,7 @@ "* [Doc2Vec Paper](https://cs.stanford.edu/~quocle/paragraph_vector.pdf)\n", "* [Dr. Michael D. Lee's Website](http://faculty.sites.uci.edu/mdlee)\n", "* [Lee Corpus](http://faculty.sites.uci.edu/mdlee/similarity-data/)\n", - "* [IMDB Doc2Vec Tutorial](https://github.com/piskvorky/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb)" + "* [IMDB Doc2Vec Tutorial](doc2vec-IMDB.ipynb)" ] }, { diff --git a/docs/notebooks/gensim Quick Start.ipynb b/docs/notebooks/gensim Quick Start.ipynb index 5f25162cb6..6d769e8e55 100644 --- a/docs/notebooks/gensim Quick Start.ipynb +++ b/docs/notebooks/gensim Quick Start.ipynb @@ -292,7 +292,7 @@ "source": [ "The `tfidf` model again returns a list of tuples, where the first entry is the token ID and the second entry is the tf-idf weighting. Note that the ID corresponding to \"system\" (which occurred 4 times in the original corpus) has been weighted lower than the ID corresponding to \"minors\" (which only occurred twice).\n", "\n", - "`gensim` offers a number of different models/transformations. See [Transformations and Topics](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Topics_and_Transformations.ipynb) for details." + "`gensim` offers a number of different models/transformations. See [Transformations and Topics](Topics_and_Transformations.ipynb) for details." ] } ], diff --git a/docs/notebooks/ldaseqmodel.ipynb b/docs/notebooks/ldaseqmodel.ipynb index 9714adea15..1c417b9ecf 100644 --- a/docs/notebooks/ldaseqmodel.ipynb +++ b/docs/notebooks/ldaseqmodel.ipynb @@ -617,7 +617,7 @@ "source": [ "As expected, the value is very high, meaning the topic distributions are far apart.\n", "\n", - "For more information on how to use the gensim distance metrics, check out [this notebook](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/distance_metrics.ipynb)." + "For more information on how to use the gensim distance metrics, check out [this notebook](distance_metrics.ipynb)." ] }, { diff --git a/docs/notebooks/word2vec.ipynb b/docs/notebooks/word2vec.ipynb index 61679cea4f..ad0dbba5c4 100644 --- a/docs/notebooks/word2vec.ipynb +++ b/docs/notebooks/word2vec.ipynb @@ -867,7 +867,7 @@ "metadata": {}, "source": [ "## Online training / Resuming training\n", - "Advanced users can load a model and continue training it with more sentences and [new vocabulary words](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/online_w2v_tutorial.ipynb):" + "Advanced users can load a model and continue training it with more sentences and [new vocabulary words](online_w2v_tutorial.ipynb):" ] }, {