diff --git a/docs/notebooks/word2vec.ipynb b/docs/notebooks/word2vec.ipynb index 6e7845e1f8..f953feb3ca 100644 --- a/docs/notebooks/word2vec.ipynb +++ b/docs/notebooks/word2vec.ipynb @@ -1295,6 +1295,107 @@ "print(train_times_table)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Adding Word2Vec \"model to dict\" method to production pipeline\n", + "Suppose, we still want more performance improvement in production. \n", + "One good way is to cache all the similar words in a dictionary.\n", + "So that next time when we get the similar query word, we'll search it first in the dict.\n", + "And if it's a hit then we will show the result directly from the dictionary.\n", + "otherwise we will query the word and then cache it so that it doesn't miss next time." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "most_similars_precalc = {word : model.wv.most_similar(word) for word in model.wv.index2word}\n", + "print(most_similars_precalc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Comparison with and without caching" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "for time being lets take 4 words randomly" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "words = ['voted','few','their','around']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Without caching" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "start = time.time()\n", + "for word in words:\n", + " result = model.wv.most_similar(word)\n", + " print(result)\n", + "end = time.time()\n", + "print(end-start)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now with caching" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "start = time.time()\n", + "for word in words:\n", + " if 'voted' in most_similars_precalc:\n", + " result = most_similars_precalc[word]\n", + " print(result)\n", + " else:\n", + " result = model.wv.most_similar(word)\n", + " most_similars_precalc[word] = result\n", + " print(result)\n", + " \n", + "end = time.time()\n", + "print(end-start)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Clearly you can see the improvement but this difference will be even larger when we take more words in the consideration." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1336,7 +1437,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", - "version": "2.7.13" + "version": "2.7.10" } }, "nbformat": 4,