diff --git a/CHANGELOG.md b/CHANGELOG.md index 8817d0e29c..b658cb84b4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,7 +17,7 @@ See the [method documentation](https://github.com/RaRe-Technologies/gensim/blob/ * Explicit epochs and corpus size in word2vec train(). (@gojomo, @robotcator, [#1139](https://github.com/RaRe-Technologies/gensim/pull/1139), [#1237](https://github.com/RaRe-Technologies/gensim/pull/1237)) New features: - +* Add modified save_word2vec_format for Doc2Vec, to save document vectors. (@parulsethi,[#1256](https://github.com/RaRe-Technologies/gensim/pull/1256)) * Add output word prediction in word2vec. Only for negative sampling scheme. See [ipynb]( https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) (@chinmayapancholi13,[#1209](https://github.com/RaRe-Technologies/gensim/pull/1209)) * scikit_learn wrapper for LSI Model in Gensim (@chinmayapancholi13,[#1244](https://github.com/RaRe-Technologies/gensim/pull/1244)) * Add the 'keep_tokens' parameter to 'filter_extremes'. (@toliwa,[#1210](https://github.com/RaRe-Technologies/gensim/pull/1210)) diff --git a/docs/notebooks/Tensorboard.png b/docs/notebooks/Tensorboard.png new file mode 100644 index 0000000000..651a23e689 Binary files /dev/null and b/docs/notebooks/Tensorboard.png differ diff --git a/docs/notebooks/Tensorboard_doc2vec.ipynb b/docs/notebooks/Tensorboard_doc2vec.ipynb new file mode 100644 index 0000000000..aa12646f32 --- /dev/null +++ b/docs/notebooks/Tensorboard_doc2vec.ipynb @@ -0,0 +1,884 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Visualizing Doc2Vec with TensorBoard\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "\n", + "\n", + "\n", + "\n", + "In this tutorial, I will explain how to visualize Doc2Vec Embeddings aka [Paragraph Vectors]() via TensorBoard. It is a data visualization framework for visualizing and inspecting the TensorFlow runs and graphs. We will use a built-in Tensorboard visualizer called *Embedding Projector* in this tutorial. It lets you interactively visualize and analyze high-dimensional data like embeddings.\n", + "\n", + "For this tutorial, a transformed MovieLens dataset[1] was used from this [repository](https://github.com/RaRe-Technologies/movie-plots-by-genre) and the movie titles were added afterwards. You can download the prepared csv from [here](https://github.com/parulsethi/DocViz/blob/master/movie_plots.csv). The input documents for training are the synopsis of movies, on which Doc2Vec model is trained. \n", + "\n", + "The visualizations will be a scatterplot as seen in the above image, where each datapoint is labelled by the movie title and colored by it's corresponding genre. You can also visit this [Projector link](http://projector.tensorflow.org/?config=https://raw.githubusercontent.com/parulsethi/DocViz/master/movie_plot_config.json) which is configured with my embeddings for the above mentioned dataset. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Define a Function to Read and Preprocess Text" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | MovieID | \n", + "Titles | \n", + "Plots | \n", + "Genres | \n", + "
---|---|---|---|---|
0 | \n", + "1 | \n", + "Toy Story (1995) | \n", + "A little boy named Andy loves to be in his roo... | \n", + "animation | \n", + "
1 | \n", + "2 | \n", + "Jumanji (1995) | \n", + "When two kids find and play a magical board ga... | \n", + "fantasy | \n", + "
2 | \n", + "3 | \n", + "Grumpier Old Men (1995) | \n", + "Things don't seem to change much in Wabasha Co... | \n", + "comedy | \n", + "
3 | \n", + "6 | \n", + "Heat (1995) | \n", + "Hunters and their prey--Neil and his professio... | \n", + "action | \n", + "
4 | \n", + "7 | \n", + "Sabrina (1995) | \n", + "An ugly duckling having undergone a remarkable... | \n", + "romance | \n", + "
5 | \n", + "9 | \n", + "Sudden Death (1995) | \n", + "Some terrorists kidnap the Vice President of t... | \n", + "action | \n", + "
6 | \n", + "10 | \n", + "GoldenEye (1995) | \n", + "James Bond teams up with the lone survivor of ... | \n", + "action | \n", + "
7 | \n", + "15 | \n", + "Cutthroat Island (1995) | \n", + "Morgan Adams and her slave, William Shaw, are ... | \n", + "action | \n", + "
8 | \n", + "17 | \n", + "Sense and Sensibility (1995) | \n", + "When Mr. Dashwood dies, he must leave the bulk... | \n", + "romance | \n", + "
9 | \n", + "18 | \n", + "Four Rooms (1995) | \n", + "This movie features the collaborative director... | \n", + "comedy | \n", + "
10 | \n", + "19 | \n", + "Ace Ventura: When Nature Calls (1995) | \n", + "Ace Ventura, emerging from self-imposed exile ... | \n", + "comedy | \n", + "
11 | \n", + "29 | \n", + "City of Lost Children, The (Cité des enfants p... | \n", + "Krank (Daniel Emilfork), who cannot dream, kid... | \n", + "sci-fi | \n", + "
12 | \n", + "32 | \n", + "Twelve Monkeys (a.k.a. 12 Monkeys) (1995) | \n", + "In a future world devastated by disease, a con... | \n", + "sci-fi | \n", + "
13 | \n", + "34 | \n", + "Babe (1995) | \n", + "Farmer Hoggett wins a runt piglet at a local f... | \n", + "fantasy | \n", + "
14 | \n", + "39 | \n", + "Clueless (1995) | \n", + "A rich high school student tries to boost a ne... | \n", + "romance | \n", + "
15 | \n", + "44 | \n", + "Mortal Kombat (1995) | \n", + "Based on the popular video game of the same na... | \n", + "action | \n", + "
16 | \n", + "48 | \n", + "Pocahontas (1995) | \n", + "Capt. John Smith leads a rag-tag band of Engli... | \n", + "animation | \n", + "
17 | \n", + "50 | \n", + "Usual Suspects, The (1995) | \n", + "Following a truck hijack in New York, five con... | \n", + "comedy | \n", + "
18 | \n", + "57 | \n", + "Home for the Holidays (1995) | \n", + "After losing her job, making out with her soon... | \n", + "comedy | \n", + "
19 | \n", + "69 | \n", + "Friday (1995) | \n", + "Two homies, Smokey and Craig, smoke a dope dea... | \n", + "comedy | \n", + "
20 | \n", + "70 | \n", + "From Dusk Till Dawn (1996) | \n", + "Two criminals and their hostages unknowingly s... | \n", + "action | \n", + "
21 | \n", + "76 | \n", + "Screamers (1995) | \n", + "(SIRIUS 6B, Year 2078) On a distant mining pla... | \n", + "sci-fi | \n", + "
22 | \n", + "82 | \n", + "Antonia's Line (Antonia) (1995) | \n", + "In an anonymous Dutch village, a sturdy, stron... | \n", + "fantasy | \n", + "
23 | \n", + "88 | \n", + "Black Sheep (1996) | \n", + "Comedy about the prospective Washington State ... | \n", + "comedy | \n", + "
24 | \n", + "95 | \n", + "Broken Arrow (1996) | \n", + "\"Broken Arrow\" is the term used to describe a ... | \n", + "action | \n", + "
25 | \n", + "104 | \n", + "Happy Gilmore (1996) | \n", + "A rejected hockey player puts his skills to th... | \n", + "comedy | \n", + "
26 | \n", + "105 | \n", + "Bridges of Madison County, The (1995) | \n", + "Photographer Robert Kincaid wanders into the l... | \n", + "romance | \n", + "
27 | \n", + "110 | \n", + "Braveheart (1995) | \n", + "When his secret bride is executed for assaulti... | \n", + "action | \n", + "
28 | \n", + "141 | \n", + "Birdcage, The (1996) | \n", + "Armand Goldman owns a popular drag nightclub i... | \n", + "comedy | \n", + "
29 | \n", + "145 | \n", + "Bad Boys (1995) | \n", + "Marcus Burnett is a hen-pecked family man. Mik... | \n", + "action | \n", + "
... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
1813 | \n", + "122902 | \n", + "Fantastic Four (2015) | \n", + "FANTASTIC FOUR, a contemporary re-imagining of... | \n", + "sci-fi | \n", + "
1814 | \n", + "127098 | \n", + "Louis C.K.: Live at The Comedy Store (2015) | \n", + "Comedian Louis C.K. performs live at the Comed... | \n", + "comedy | \n", + "
1815 | \n", + "127158 | \n", + "Tig (2015) | \n", + "An intimate, mixed media documentary that foll... | \n", + "comedy | \n", + "
1816 | \n", + "127202 | \n", + "Me and Earl and the Dying Girl (2015) | \n", + "Seventeen-year-old Greg has managed to become ... | \n", + "comedy | \n", + "
1817 | \n", + "129354 | \n", + "Focus (2015) | \n", + "In the midst of veteran con man Nicky's latest... | \n", + "action | \n", + "
1818 | \n", + "129428 | \n", + "The Second Best Exotic Marigold Hotel (2015) | \n", + "The Second Best Exotic Marigold Hotel is the e... | \n", + "comedy | \n", + "
1819 | \n", + "129937 | \n", + "Run All Night (2015) | \n", + "Professional Brooklyn hitman Jimmy Conlon is m... | \n", + "action | \n", + "
1820 | \n", + "130490 | \n", + "Insurgent (2015) | \n", + "One choice can transform you-or it can destroy... | \n", + "sci-fi | \n", + "
1821 | \n", + "130520 | \n", + "Home (2015) | \n", + "An alien on the run from his own people makes ... | \n", + "animation | \n", + "
1822 | \n", + "130634 | \n", + "Furious 7 (2015) | \n", + "Dominic and his crew thought they'd left the c... | \n", + "action | \n", + "
1823 | \n", + "131013 | \n", + "Get Hard (2015) | \n", + "Kevin Hart plays the role of Darnell--a family... | \n", + "comedy | \n", + "
1824 | \n", + "132046 | \n", + "Tomorrowland (2015) | \n", + "Bound by a shared destiny, a bright, optimisti... | \n", + "sci-fi | \n", + "
1825 | \n", + "132480 | \n", + "The Age of Adaline (2015) | \n", + "A young woman, born at the turn of the 20th ce... | \n", + "romance | \n", + "
1826 | \n", + "132488 | \n", + "Lovesick (2014) | \n", + "Lovesick is the comic tale of Charlie Darby (M... | \n", + "fantasy | \n", + "
1827 | \n", + "132796 | \n", + "San Andreas (2015) | \n", + "In San Andreas, California is experiencing a s... | \n", + "action | \n", + "
1828 | \n", + "132961 | \n", + "Far from the Madding Crowd (2015) | \n", + "In Victorian England, the independent and head... | \n", + "romance | \n", + "
1829 | \n", + "133195 | \n", + "Hitman: Agent 47 (2015) | \n", + "An assassin teams up with a woman to help her ... | \n", + "action | \n", + "
1830 | \n", + "133645 | \n", + "Carol (2015) | \n", + "In an adaptation of Patricia Highsmith's semin... | \n", + "romance | \n", + "
1831 | \n", + "134130 | \n", + "The Martian (2015) | \n", + "During a manned mission to Mars, Astronaut Mar... | \n", + "sci-fi | \n", + "
1832 | \n", + "134368 | \n", + "Spy (2015) | \n", + "A desk-bound CIA analyst volunteers to go unde... | \n", + "comedy | \n", + "
1833 | \n", + "134783 | \n", + "Entourage (2015) | \n", + "Movie star Vincent Chase, together with his bo... | \n", + "comedy | \n", + "
1834 | \n", + "134853 | \n", + "Inside Out (2015) | \n", + "After young Riley is uprooted from her Midwest... | \n", + "comedy | \n", + "
1835 | \n", + "135518 | \n", + "Self/less (2015) | \n", + "A dying real estate mogul transfers his consci... | \n", + "sci-fi | \n", + "
1836 | \n", + "135861 | \n", + "Ted 2 (2015) | \n", + "Months after John's divorce, Ted and Tami-Lynn... | \n", + "comedy | \n", + "
1837 | \n", + "135887 | \n", + "Minions (2015) | \n", + "Ever since the dawn of time, the Minions have ... | \n", + "comedy | \n", + "
1838 | \n", + "136016 | \n", + "The Good Dinosaur (2015) | \n", + "In a world where dinosaurs and humans live sid... | \n", + "animation | \n", + "
1839 | \n", + "139855 | \n", + "Anomalisa (2015) | \n", + "Michael Stone, an author that specializes in c... | \n", + "animation | \n", + "
1840 | \n", + "142997 | \n", + "Hotel Transylvania 2 (2015) | \n", + "The Drac pack is back for an all-new monster c... | \n", + "animation | \n", + "
1841 | \n", + "145935 | \n", + "Peanuts Movie, The (2015) | \n", + "Charlie Brown, Lucy, Snoopy, and the whole gan... | \n", + "animation | \n", + "
1842 | \n", + "149406 | \n", + "Kung Fu Panda 3 (2016) | \n", + "Continuing his \"legendary adventures of awesom... | \n", + "comedy | \n", + "
1843 rows × 4 columns
\n", + "