Find keywords using entropy with Montemurro and Zanette algorithm #665

tmylk · 2016-04-12T13:18:14Z

The algorithm identifies words that are significant to the structure of the document - these often correspond to the major themes. It does so independently of a corpus.

Dr Peter J. Bleackley has kindly suggested his implementation

PeteBleackley · 2016-04-12T13:36:27Z

However, I feel that it will need a few modifications to its API for fit in nicely with the rest of Gensim, and would like some advice form the core Gensim developers.
My questions are

Where would be the best place to fit this algorithm into the Gensim project structure?
In what format should the algorithm ingest data? The current implementation is designed for XML, mainly for historic reasons.
In what format should the algorithm return its results?

Once I have answers to these questions, it shouldn't take too long to modify my code accordingly.

tmylk · 2016-07-06T07:27:40Z

Nice meeting you again yesterday. I will put this algo on our student page.

PeteBleackley · 2016-07-07T15:04:02Z

I'll be happy to advise any student who takes this project on.

bhargavvader · 2016-09-29T14:01:28Z

If there is interest for this and no one else wishes to take it up, I would like to give it a shot. :)

piskvorky · 2016-11-04T05:28:08Z

@bhargavvader sounds good, thanks!

@tmylk can you add some context to this ticket? What is "Montemurro and Zanette algorithm"?

PeteBleackley · 2016-11-11T13:15:36Z

Here's a link to a paper describing the algorithm.

https://arxiv.org/abs/0907.1558

piskvorky · 2016-12-13T08:08:53Z

@tmylk ticket context still missing, update.

tmylk · 2016-12-28T21:06:14Z

@piskvorky Could you please suggest a way to add context? The context is clear to me, with relevant links. There is even a volunteer contributor.

piskvorky · 2017-01-06T11:25:42Z

Sure -- something along the lines of "Here's a problem / motivation; here's what we could do to solve it".

The first part is missing -- from the link it's not apparent to me what "Montemurro and Zanette algorithm" does, and the linked implementation doesn't explain it either (that I can see).

If this is implemented in gensim, what will it actually do? Who is it for?

PeteBleackley · 2017-01-06T12:59:27Z

The algorithm identifies words that are significant to the structure of the document - these often correspond to the major themes. It does so independently of a corpus.

piskvorky · 2017-01-08T09:50:15Z

Aha, thanks @PeteBleackley. So this is a candidate to replace the summarization.keywords package, if I understand correctly @tmylk .

It would be interesting to compare them side-by-side, see which algo works better (and deprecate the other one -- we don't want to maintain dead weight in gensim).

Or if the algorithms have non-overlapping strengths/weaknesses, document what they are. When should users use one or the other? Is there a standard benchmark? (@tmylk Qs for the incubator project)

PeteBleackley · 2017-11-23T17:22:56Z

I've implemented this in #1738. However, there is a merge conflict in summarization/init.py that needs to be resolved.

tmylk added feature Issue described a new feature wishlist Feature request labels Apr 12, 2016

tmylk changed the title ~~Add Montemurro and Zanette algorithm~~ Find keywords using entropy with Montemurro and Zanette algorithm Nov 8, 2016

menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Oct 2, 2017

PeteBleackley mentioned this issue Nov 23, 2017

Implementation of Montemurro and Zanette's entropy based keyword extraction algorithm #1738

Merged

menshikh-iv closed this as completed in c462bd0 Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find keywords using entropy with Montemurro and Zanette algorithm #665

Find keywords using entropy with Montemurro and Zanette algorithm #665

tmylk commented Apr 12, 2016 •

edited

Loading

PeteBleackley commented Apr 12, 2016

tmylk commented Jul 6, 2016

PeteBleackley commented Jul 7, 2016

bhargavvader commented Sep 29, 2016

piskvorky commented Nov 4, 2016

PeteBleackley commented Nov 11, 2016

piskvorky commented Dec 13, 2016

tmylk commented Dec 28, 2016

piskvorky commented Jan 6, 2017 •

edited

Loading

PeteBleackley commented Jan 6, 2017

piskvorky commented Jan 8, 2017 •

edited

Loading

PeteBleackley commented Nov 23, 2017

Find keywords using entropy with Montemurro and Zanette algorithm #665

Find keywords using entropy with Montemurro and Zanette algorithm #665

Comments

tmylk commented Apr 12, 2016 • edited Loading

PeteBleackley commented Apr 12, 2016

tmylk commented Jul 6, 2016

PeteBleackley commented Jul 7, 2016

bhargavvader commented Sep 29, 2016

piskvorky commented Nov 4, 2016

PeteBleackley commented Nov 11, 2016

piskvorky commented Dec 13, 2016

tmylk commented Dec 28, 2016

piskvorky commented Jan 6, 2017 • edited Loading

PeteBleackley commented Jan 6, 2017

piskvorky commented Jan 8, 2017 • edited Loading

PeteBleackley commented Nov 23, 2017

tmylk commented Apr 12, 2016 •

edited

Loading

piskvorky commented Jan 6, 2017 •

edited

Loading

piskvorky commented Jan 8, 2017 •

edited

Loading