Skip to content

Commit

Permalink
Add model_to_dict one-liner to word2vec notebook. Fix #1269 (#1776)
Browse files Browse the repository at this point in the history
* Add model to dict method

* add documentation and oneliner code

* Add benchmark
  • Loading branch information
kakshay21 authored and menshikh-iv committed Dec 12, 2017
1 parent 6248d33 commit bf1b865
Showing 1 changed file with 102 additions and 1 deletion.
103 changes: 102 additions & 1 deletion docs/notebooks/word2vec.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1295,6 +1295,107 @@
"print(train_times_table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding Word2Vec \"model to dict\" method to production pipeline\n",
"Suppose, we still want more performance improvement in production. \n",
"One good way is to cache all the similar words in a dictionary.\n",
"So that next time when we get the similar query word, we'll search it first in the dict.\n",
"And if it's a hit then we will show the result directly from the dictionary.\n",
"otherwise we will query the word and then cache it so that it doesn't miss next time."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"most_similars_precalc = {word : model.wv.most_similar(word) for word in model.wv.index2word}\n",
"print(most_similars_precalc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Comparison with and without caching"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"for time being lets take 4 words randomly"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"words = ['voted','few','their','around']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without caching"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"start = time.time()\n",
"for word in words:\n",
" result = model.wv.most_similar(word)\n",
" print(result)\n",
"end = time.time()\n",
"print(end-start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now with caching"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"start = time.time()\n",
"for word in words:\n",
" if 'voted' in most_similars_precalc:\n",
" result = most_similars_precalc[word]\n",
" print(result)\n",
" else:\n",
" result = model.wv.most_similar(word)\n",
" most_similars_precalc[word] = result\n",
" print(result)\n",
" \n",
"end = time.time()\n",
"print(end-start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clearly you can see the improvement but this difference will be even larger when we take more words in the consideration."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1336,7 +1437,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
"version": "2.7.10"
}
},
"nbformat": 4,
Expand Down

0 comments on commit bf1b865

Please sign in to comment.