-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author-topic model #893
Merged
tmylk
merged 103 commits into
piskvorky:develop
from
olavurmortensen:author-topic_model
Jan 17, 2017
Merged
Author-topic model #893
Changes from 92 commits
Commits
Show all changes
103 commits
Select commit
Hold shift + click to select a range
2e8f3cb
Initial commit. Very early stages of algorithm development.
olavurmortensen a21059e
Fixed some errors.
olavurmortensen a9bddaa
Added online algorithm, removed batch algorithm.
olavurmortensen 7ea76f2
Using max change instead of mean change criterion. Computing a differ…
olavurmortensen 839a8b3
Fixed some things with var_mu. Also was passing the wrong arguments t…
olavurmortensen bd13c60
Added 'offline' algorithm, and notebook for experiments.
olavurmortensen ebc808c
Fixed log normalization. Also changed symmetric initilization of hype…
olavurmortensen 16b26f7
Removed offline algorithm class as it is no longer necessary.
olavurmortensen 10d2b36
Changed name of online algorithm class and file.
olavurmortensen c94f516
Made some changes to how the likelihood is computed.
olavurmortensen a1d758f
Changed the name of the online algorithm again.
olavurmortensen 46cc8bf
Brought the offline algorithm back.
olavurmortensen 3e53655
Working on bound computation.
olavurmortensen 09666c4
Changed the way the data structure is prepared and how the model acce…
olavurmortensen a562fca
Cleaned the code up a bit. Added a simple method to get author topics.
olavurmortensen 0de43a5
Removed some comments, mostly TODOs.
olavurmortensen a892564
Ran some very successful experiments on 286 documents. Offline algori…
olavurmortensen 388a5e9
Changed the online algorithm according to all the changes that have h…
olavurmortensen 2b2a896
Fixed mistake with mu variable.
olavurmortensen 3756435
Fixed lambda update, multiplication by size of corpus was missing. Re…
olavurmortensen ed3416d
Added a loop for passing over entire corpus. Discarded use of log_nor…
olavurmortensen 994f212
Moved bound computation out of corpus-wide loop.
olavurmortensen 956fbd5
Updated notebook.
olavurmortensen 40bbabf
Computing rho in a different way. Added the possibility to evaluate o…
olavurmortensen ed96b23
Implemented hyperparam MLE for eta and alpha in offline algo. Removed…
olavurmortensen a225399
Made it possible to sample a subset of documents in lambda update to …
olavurmortensen 938daff
Now, if LDA topics are supplied lambda is not estimated at all. Added…
olavurmortensen b43d344
Updating notebook.
olavurmortensen 1dc7e6a
Working on line search for hyperparam MLE.
olavurmortensen 910c626
Made some structural changes to bound and log probability computation.
olavurmortensen 7dbd01f
In process of updating online algo w.r.t. changes in offline algo.
olavurmortensen 9a04533
Mostly updated the online algorithm according to changes that have be…
olavurmortensen b450609
Fixed a critical mistake in the online algorithm.
olavurmortensen d3ca917
Removed a redundancy in lambda update. Updated notebook.
olavurmortensen ba5ba63
Making sure that the model is evaluated after the last iteration, if …
olavurmortensen afa747d
Updated notebook.
olavurmortensen 693b70b
Fixed mistake in interpolating gamma. Moved lambda update outside of …
olavurmortensen 7783261
Working on an algorithm that tries to process each 'disjoint' set of …
olavurmortensen 868b174
Working on a minibatch algorithm. Updated notebook.
olavurmortensen 1cfd00f
Only updating the necessary expected log theta. Changed the name of O…
olavurmortensen edd5025
Implemented a new algorithm. It is 5 times faster, more memory effici…
olavurmortensen fafc20a
Moved all algorithms except the new online one to a 'temp' folder. Ve…
olavurmortensen 32e750d
Changed the name of the main algorithm (and file). Made a new noteboo…
olavurmortensen 4286e90
Cleaning up code. Removed or changed a lot of comments. Removed optio…
olavurmortensen 12f231c
Was computing the norm of phi incorrectly, fixed that, speed-up not a…
olavurmortensen 76764ff
Working on numerically stable phi update and bound computation. Is no…
olavurmortensen 4cb3ee9
Made a separate file for unvectorized code and stored it in , in case…
olavurmortensen 7c14f61
Implemented mini-batch algorithm.
olavurmortensen eade3e1
Some minor changes to old code.
olavurmortensen 6fe4c0e
Updated notebook.
olavurmortensen 1975321
In mini-batch algo, only the terms seen in the current chunk are upda…
olavurmortensen e4a0e4b
Updated notebook. Finally getting decent results on the entire NIPS d…
olavurmortensen 526a3bb
Optimized phinorm computation by taking expElogbeta out of the loop.
olavurmortensen 5ee9a95
Computing the bound more efficiently (much faster now). Now not passi…
olavurmortensen e0d7367
Removed unnecessary temp file. Updated notebook.
olavurmortensen 2f621e2
Merged upstream develop branch into my feature branch.
olavurmortensen df11bb4
In the process of refactoring (atmodel2.py will become the new atmode…
olavurmortensen 054d37c
Merge branch 'develop' into author-topic_model
olavurmortensen 9d9da44
Refactoring the code. A lot left to do.
olavurmortensen 336ff92
The refactored code now runs, converges almost exactly as the old cod…
olavurmortensen e5e7722
Refactoring. Various docstring and commenting. Made methods for const…
olavurmortensen 861e81a
New refactored code now in atmodel.py. Old code is in atmodelold.py, …
olavurmortensen e911aed
Implemented 'continued training' (call update multiple times) and __g…
olavurmortensen ff7f8e6
A lot of changes. Most notably, added docstrings, and made it possibl…
olavurmortensen bdac93a
Added unit tests. Basically a retrofit of LDA test; some new tests, s…
olavurmortensen 9429c0a
Updated unit tests. Fixed some mistakes. Added some tests; testing up…
olavurmortensen e0dc2d9
Forgot to add num_docs to ids of new authors in id2author. Some comme…
olavurmortensen aabc0f4
Made it possible to use serialized corpora (MmCorpus), and made unit …
olavurmortensen e526cbc
Removed code in unit tests that silence logging (useful when doing lo…
olavurmortensen 8cb404f
Just removed a comment.
olavurmortensen 6cf4e75
Reverted some changes that were made to ldamodel.py that were no long…
olavurmortensen 94956fa
get_author_topics now takes author name instead of integer ID; change…
olavurmortensen ebd9679
Logging silencing again causing unit test failures. Fixed.
olavurmortensen bafb5ef
Updated docstring. Changed __getitem__ method.
olavurmortensen 8cd90cf
Added a new notebook where a stackexchange dataset is used. Started w…
olavurmortensen ac9ecd4
Updated notebooks (just to trigger rebuild).
olavurmortensen aa08b49
Updated code w.r.t. comments from Lev (@tmylk).
olavurmortensen 9ce1fd5
Updated all notebooks.
olavurmortensen f1f9f50
Two algorithms in 'temp' used to test the difference between blocking…
olavurmortensen cad8f26
Added the deepcopy again. Without it, the program can fail and the sy…
olavurmortensen 6caefd7
Removed minimum_phi_value test (was already commented out).
olavurmortensen 48b6c1a
Comments and docstrings. Responding to comments from Lev, and working…
olavurmortensen d03e020
Added the author-topic model to the API reference. Also slight change…
olavurmortensen 7ac77b7
Added a test for gamma in persistency.
olavurmortensen cab716d
Removed test for single author in persistency test (test is simplifie…
olavurmortensen be7bddf
Removed save and load methods, using LdaModel's methods directly work…
olavurmortensen ffadaf1
Removed all temporary files.
olavurmortensen e218883
Made changes to model and test in preperation for a merge with upstream.
olavurmortensen 7d2994f
Merge remote-tracking branch 'upstream/develop' into author-topic_model
olavurmortensen 616a965
Modified the bound method; it was somewhat confusing, and there were …
olavurmortensen 7f98e3a
Simplified sum in phi norm computation.
olavurmortensen ddfc8f7
Fixed some mistakes introduced in bound method in recent commit.
olavurmortensen 7d03608
Updated algorithm and tests w.r.t. comments from Lev. Other changes a…
olavurmortensen 661e7e5
Updated tutorial. Removed test notebook.
olavurmortensen 85123c0
Updated notebook.
olavurmortensen 91675a5
Updated tutorial.
olavurmortensen 6d961a5
Updated tutorial (introduction).
olavurmortensen 13fa9ee
Changes w.r.t. change requests from @tmylk, plus some other changes.
olavurmortensen 018896c
Added the URL to view notebook in HTML to tutorial.
olavurmortensen a0a9832
Telling users to view the notebook in nbviewer instead. Docstring lin…
olavurmortensen 5d6944a
Removed a fixme about tutorial link.
olavurmortensen 8e56e9e
Fixed a small mistake in bound method.
olavurmortensen aecaecb
Added further explanation of tutorial goal in notebook.
olavurmortensen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
:mod:`models.atmodel` -- Author-topic models | ||
====================================================== | ||
|
||
.. automodule:: gensim.models.atmodel | ||
:synopsis: Author-topic model | ||
:members: | ||
:inherited-members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it generate well with sphinx and appear in documentation index page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how this works, I just made it the same way as the LdaModel rst file. What should I do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See how to test RST in #906
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check html is generated in https://github.com/RaRe-Technologies/gensim/wiki/Developer-page#documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code appears to generate sphinx documentation properly, and appears in the API reference as well.