-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find keywords using entropy with Montemurro and Zanette algorithm #665
Comments
However, I feel that it will need a few modifications to its API for fit in nicely with the rest of Gensim, and would like some advice form the core Gensim developers.
Once I have answers to these questions, it shouldn't take too long to modify my code accordingly. |
Nice meeting you again yesterday. I will put this algo on our student page. |
I'll be happy to advise any student who takes this project on. |
If there is interest for this and no one else wishes to take it up, I would like to give it a shot. :) |
@bhargavvader sounds good, thanks! @tmylk can you add some context to this ticket? What is "Montemurro and Zanette algorithm"? |
Here's a link to a paper describing the algorithm. |
@tmylk ticket context still missing, update. |
@piskvorky Could you please suggest a way to add context? The context is clear to me, with relevant links. There is even a volunteer contributor. |
Sure -- something along the lines of "Here's a problem / motivation; here's what we could do to solve it". The first part is missing -- from the link it's not apparent to me what "Montemurro and Zanette algorithm" does, and the linked implementation doesn't explain it either (that I can see). If this is implemented in gensim, what will it actually do? Who is it for? |
The algorithm identifies words that are significant to the structure of the document - these often correspond to the major themes. It does so independently of a corpus. |
Aha, thanks @PeteBleackley. So this is a candidate to replace the It would be interesting to compare them side-by-side, see which algo works better (and deprecate the other one -- we don't want to maintain dead weight in gensim). Or if the algorithms have non-overlapping strengths/weaknesses, document what they are. When should users use one or the other? Is there a standard benchmark? (@tmylk Qs for the incubator project) |
I've implemented this in #1738. However, there is a merge conflict in summarization/init.py that needs to be resolved. |
The algorithm identifies words that are significant to the structure of the document - these often correspond to the major themes. It does so independently of a corpus.
Dr Peter J. Bleackley has kindly suggested his implementation
The text was updated successfully, but these errors were encountered: