-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Added Jensen Shannon metric and dendrogram visualization #1484
Conversation
Hanging indent everywhere please -- vertical indent makes the code hard to read and hard to maintain. |
@parulsethi It's Jensen-Shannon (after Johan Jensen and Claude Shannon). |
…into jenson_shannon
@piskvorky Can you please point to which line should be changed to hanging indent? As if i understand correctly hanging/vertical indent is used in case of long arguments? and there weren't any in this PR so I didn't use any line breaks in code |
Unfortunately Github says the notebook is too large and its diff cannot be displayed, so I cannot review directly. But the indentation is broken around There are also a few minor code style inconsistencies, like spaces around @menshikh-iv is there a way to somehow review the notebook? I'd like to point out some constructs which, while not wrong, could be improved a little for better readability (like |
@piskvorky I'll remove the cell outputs once the notebook is complete. It will then have a much smaller diff containing only the input code cells which could be displayed in Github for code review. |
@parulsethi let's fix that Jenson everywhere, before it propagates too far through the code/notebooks. |
I've removed the plotly code and cell outputs for smaller diff on github, will add it back after the code review of notebook is complete. If needed, the output visualization can be seen here in a previous commit. @piskvorky corrected the name in a notebook comment it was being used in. Elsewhere it's the function name. |
The name is incorrect and needs to be fixed, that's what I mean. |
Oh, the spelling. Sorry, misunderstood earlier |
@piskvorky About notebook - nbdime good tool for view diffs between notebooks, but this tool doesn't integrate with github. Also, we can use |
"outputs": [], | ||
"source": [ | ||
"from gensim.models.ldamodel import LdaModel\n", | ||
"from gensim.corpora import Dictionary, MmCorpus\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports: gensim.corpora.MmCorpus
, scipy.spatial.distance.squareform
, plotly.figure_factory
"texts = []\n", | ||
"for line in df_fake.text:\n", | ||
" lowered = line.lower()\n", | ||
" words = re.findall(r'\\w+', lowered, flags = re.UNICODE | re.LOCALE)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary spaces flags = re.UNICODE | re.LOCALE
"source": [ | ||
"from gensim.matutils import jensen_shannon\n", | ||
"\n", | ||
"from random import sample\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports: random.sample
, scipy
"\n", | ||
"# Plot dendrogram\n", | ||
"dendro = create_dendrogram(topic_dist, distfun=js_dist, labels=range(1, 36), annotation=annotation)\n", | ||
"dendro['layout'].update({'width':1000, 'height':600})\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces after :
"annotation = text_annotation(topic_dist, topic_terms, n_ann_terms)\n", | ||
"\n", | ||
"# Initialize figure by creating upper dendrogram\n", | ||
"figure = create_dendrogram(topic_dist, distfun=js_dist, labels = range(1, 36), annotation=annotation)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no needed spaces labels = range(1, 36)
"# Add Heatmap Data to Figure\n", | ||
"figure['data'].extend(heatmap)\n", | ||
"\n", | ||
"dendro_leaves = [x+1 for x in dendro_leaves]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing whitespace x+1
"dendro_leaves = [x+1 for x in dendro_leaves]\n", | ||
"\n", | ||
"# Edit Layout\n", | ||
"figure['layout'].update({'width':800, 'height':800,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing spaces after :
(and below)
" 'showline': False,\n", | ||
" \"showticklabels\": True, \n", | ||
" \"tickmode\": \"array\",\n", | ||
" \"ticktext\" : dendro_leaves,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary space (before :
) and below
" 'showline': False,\n", | ||
" \"showticklabels\": True, \n", | ||
" \"tickmode\": \"array\",\n", | ||
" \"ticktext\" : dendro_leaves,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary space (before :
) and below
" 'showline': False,\n", | ||
" 'zeroline': False,\n", | ||
" 'showticklabels': False,\n", | ||
" 'ticks':\"\"}})\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space
@parulsethi please fix PEP things, return all images and resolve merge conflict. LGTM for me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo.
"# no. of terms to display in annotation\n", | ||
"n_ann_terms = 10\n", | ||
"\n", | ||
"# use Jenson-Shannon distance metric in dendrogram\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more "Jenson".
Good job @parulsethi 🥇 |
* add notebook with js heatmap * separate out vector conversion function * modify notebook * add cluster heatmap notebook * print y-axis values * correct y labels * add nb details * few more nb details * add text labels * print linkage/dendo data * add annotations for all hierarchy levels * make diff annotations symmetric * add Plotly's dendrogram code * train for more passes * remove outputs * remove plotly code and imports * err.. re-run cells * fix jensen spelling * made requested changes
Adds Jensen-Shannon distance metric and topic dendrogram-heatmap visualization notebook.