Coreference Resolution #565

oaguy1 · 2019-02-10T02:29:08Z

Hello! I am looking into using coreference resolution in a project I am working. There exist a reasonably easy (read: does not require a neural network and training) algorithm to do just this and I was thinking of adding it to this library. I read the contributing guide and wanted to make an issue to test the water before spending a lot of time working on this.

oaguy1 · 2019-02-10T02:33:13Z

A simpler explanation and sample implementation of the algorithm I mentioned can be found here.

spencermountain · 2019-02-11T17:30:28Z

YESSSSS
go for it!
any ideas about how you'd like to handle the api for it? I'd be happy to help.

something like this?

let doc=nlp('Carrots are orange. They are delicious.')
doc.pronouns().data()
// [{text:'they', normal:'they', reference:'carrots'}]

doc.nouns().data()
//[{text:'carrots', normal:'carrots', references:['they']}]

something like that?
there is a term-id property, (i think) that you could use, too.
anyways, yeah. sounds great. go for it.

spencermountain · 2019-02-11T17:45:32Z

it may be desirable too, to actually fetch the reference word(s), so that people can do whatever they want to the results, like replace them or something.

The only tricky-part i can imagine is tracking-down the reference word(s), and packing them into a Text object, so that a person can do doc.match('#Vegetable').nouns().references().match('#whatever').toUpperCase()... and so on.
This could get a little complicated. I'm happy to help

oaguy1 · 2019-02-12T18:12:21Z

Glad you are excited! The algorithm I linked to can track down the the references of the pronouns in a manner that is right most of the time (80%).

The way I was thinking about approaching this was adding an additional tagging step where we looked at each of the pronouns and then use Hobbs’ algorithm to find the best guess at the antecedent. With that in mind, my initial plan for the API was something like this:

// grabbing the antecendt to a pronoun
doc.match(“#Pronoun”).get(0).antecedent();

// grabbing the pronouns for person
doc.people().get(0).pronouns();

I think once we have the additional API built out for Terms, something closer to what you initially suggested on the more macro/document level.

Let me know what you think! I plan on sitting down and putting some more time on this tomorrow.

spencermountain · 2019-02-12T21:51:50Z

yeah cool!
to make it feel like the other methods, i'd do it like this

doc.match(“#Pronoun”).antecedents(0);
doc.people().pronouns(0);

either way, happy to see this in-action, then we can shove it around after.

been thinking past few weeks about breaking-up compromise into more micro-libraries, like d3 did. If we end up doing that, this work will end-up in a named-entity-plugin, or something like that
(just a heads-up)
thanks, lemme know if I can help with anything.

oaguy1 · 2019-02-22T01:36:59Z

Sounds good! I definitely want to try to keep the API as close to the rest of the library as possible. I am hacking on this when I have time, but still won't have much to share for a while. Once I have a good working MVP with tests I will make a PR and we can really play with it.

oaguy1 · 2019-03-07T17:35:54Z

@spencermountain As part of the algorithm I am implementing, I am trying to start from an individual Term (an instance of a pronoun) and then move to the previous Term in the sentence to see if it matches some criteria. Once the beginning of the sentence is reached, the "previous" term would the last Term in the previous sentence. It would also be good to know if the previous term came from another sentence or paragraph. Is there support for such movement within the text currently in the lib? If not, where would be a good place to start for adding it?

spencermountain · 2019-03-07T17:41:44Z

hey David, yeah you may want to just use the internal arrays of sentences, and terms.

let doc = nlp(myText)
doc.list //arrays of sentences
doc.list[0].terms // terms in each sentence

we don't have any support for paragraphs (right now)

oaguy1 · 2019-03-08T01:41:17Z

That is helpful, thank you so much for responding so quickly.

I was thinking, if you only have sentences of terms, how do you feel about adding some sort of index to the Term objects, so they are aware of their position within the document? I could add this during the build process, an attribute named something like refPosition with a two item length array [index of sentence, index of term].

Let me know what you think, I don't want to be too crazy adding things w/o checking in.

spencermountain · 2019-03-08T14:42:12Z

hey David, yeah this has been the hard-part of making compromise, that 'position within the document' changes considerably, and depends on where the user is zooming-in, cloning, etc.

I've started working on a major re-write, for v12, that you may be interested in, over here. It uses a linked-list model, so references, and indexes are more 'postmodern', and don't suffer any of the awkwardness you're going through.

I'm also concerned that adding in co-reference resolution to v11 may be more complicated than it would be in v12. It's not very solid yet, and still moving-around in some circles..

How would you feel about me creating a compromise-coreference repo, and us working on it there?

That would give us an opportunity to implement that Hobbs paper, without worrying about api changes:

const nlp=require('compromise')
const ccr=require('compromise-coreference')

let doc=nlp(myText)
let json = ccr(doc)
/* {whatever json-structure you'd like} */

how's that?

oaguy1 · 2019-03-08T16:41:37Z

That would be a great stop-gap between APIs and a good place to get the algorithm down before (possibly) adding it to the full lib. If it would be easier to implement the paper in the new API, I am happy to lend a hand to speed that along, too. Thank you for all the help so far! It is nice to have a positive contribution experience on such a cool project.

…

On Fri, Mar 8, 2019 at 09:42 spencer kelly ***@***.***> wrote: hey David, yeah this has been the hard-part of making compromise, that 'position within the document' changes considerably, and depends on where the user is zooming-in, cloning, etc. I've started working on a major re-write, for v12, that you may be interested in, over here <https://github.com/spencermountain/compromise/tree/linked-list>. It uses a linked-list model, so references, and indexes are more 'postmodern', and don't suffer any of the awkwardness you're going through. I'm also concerned that adding in co-reference resolution to v11 may be more complicated than it would be in v12. It's not very solid yet, and still moving-around in some circles.. How would you feel about me creating a compromise-coreference repo, and us working on it there? That would give us an opportunity to implement that Hobbs paper, without worrying about api changes: const nlp=require('compromise') const ccr=require('compromise-coreference') let doc=nlp(myText) let json = ccr(doc) /* {whatever json-structure you'd like} */ how's that? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACPxnul48f8rfEyU-vJIUbKrQuySFv4Qks5vUnbLgaJpZM4ay2cL> .

spencermountain · 2019-03-11T22:23:26Z

hey, i've added you to a basic version of this here.
take it for a ride - feel free to commit directly to it, it's pretty-rough!
cheers

oaguy1 · 2019-03-11T22:25:26Z

Awesome, thanks!

…

On Mon, Mar 11, 2019 at 18:23 spencer kelly ***@***.***> wrote: hey, i've added you to a basic version of this here <https://github.com/nlp-compromise/compromise-coreference>. take it for a ride - feel free to commit directly to it, it's pretty-rough! cheers — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACPxniRtoEzMAe7XEzaJAuWq-PrhQspYks5vVtdjgaJpZM4ay2cL> .

oaguy1 · 2019-03-14T11:20:24Z

Also, have you considered a tree structure over a linked list? That seems to be the data structure that many nlp libs use and makes adding additional depths (paragraphs, etc) and exact document position more doable. On Mon, Mar 11, 2019 at 18:25 David Hughes-Robinson <oaguy1@gmail.com> wrote:

…

Awesome, thanks! On Mon, Mar 11, 2019 at 18:23 spencer kelly ***@***.***> wrote: > hey, i've added you to a basic version of this here > <https://github.com/nlp-compromise/compromise-coreference>. > take it for a ride - feel free to commit directly to it, it's > pretty-rough! > cheers > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#565 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ACPxniRtoEzMAe7XEzaJAuWq-PrhQspYks5vVtdjgaJpZM4ay2cL> > . >

spencermountain · 2019-03-14T13:14:18Z

i'd love to hear more about this idea, how do you imagine it working?

oaguy1 · 2019-03-14T13:51:04Z

Would love to discuss. I created a slack channel so we can go back and forth without polluting this issue. https://compromisenlp.slack.com/

spencermountain · 2019-03-14T14:08:32Z

wanna just join the existing slack group?

oaguy1 · 2019-03-14T14:09:48Z

Yes! I will delete the group I created (should have searched first)

…

On Thu, Mar 14, 2019 at 10:08 spencer kelly ***@***.***> wrote: wanna just join the existing slack group <https://slackin-kyzvclgjlg.now.sh/>? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACPxnhBfuEgnCudHv23UrBLpvbKkgXVpks5vWlflgaJpZM4ay2cL> .

au-re · 2023-01-06T20:30:25Z

hi! sorry to comment on an old issue, but I was wondering if coreference resolution eventually did become part of compromise?

spencermountain · 2023-01-06T20:45:47Z

hey Aurélien - on my new-years resolutions this year.

There's actually an undocumented api for it here - i wouldn't recommend using it yet though.

will update this issue when it lands. Would love some help.
cheers

spencermountain · 2023-01-06T20:50:20Z

if you, (or anybody) was interested in working on it, the current implementation is here

it's a pretty-tricky problem. current version looks back 2 sentences for a 'he' or 'she'. i think i started to try 'they' and got overwhelmed. 'it' is most-likely the hardest.
it should also chain, so 'he' looks for previous 'he' references, etc.
cheers

au-re · 2023-01-26T01:19:16Z

I've been reading a bit about the topic, turns out co-reference resolution is a whole field of research 😅 I found a paper describing a nice rule based algorithm that might be a good starting point https://aclanthology.org/J13-4004.pdf

It describes a series of sieves that are applied until all mentions in a text refer to some entity.

Maybe it could work something like this:

const text = "John is a musician. He played a new song. A girl was listening to the song. 'It is my favorite', John said to her."

nlp(text).coreference().json()
[
    { terms: [...], text: "John", coreference: { refs: [1]  } }, 
    { terms: [...], text: "he", coreference: { refs: [1]  } }, 
    { terms: [...], text: "a new song", coreference: { refs: [2]  } }, 
    { terms: [...], text: "It", coreference: { refs: [2]  } }, 
    { terms: [...], text: "A girl", coreference:{ refs: [3]  } }, 
    { terms: [...], text: "the song", coreference: { refs: [2]  } }, 
    { terms: [...], text: "my", coreference:{ refs: [1] }, 
    { terms: [...], text: "her", coreference: { refs: [3]  } }, 
]

Keeping an array of references might be useful for cases where one word might refer to several entities (e.g. "they")

Here are some of the sieves described in the paper:

Mention Detection
Speaker Identification
Exact Match
Pronominal Coreference Resolution (I think this is what you have started working on)

For each mention we then try to find a matching antecedent by running it through every sieve, a sieve either resolves the match or leaves it for a later sieve.

Some additional methods might be useful to build the sieves:

nlp(text).mentions().json()
// [{ terms: [...], text: "John" }, { terms: [...], text: "It" }, { terms: [...], text: "A girl" }, { terms: [...], text: "my" }, ...]

nlp(text).speakers().json()
// [{ terms: [...], text: "John", speaker: { quote: "It is my favorite" } }]

spencermountain · 2023-01-26T16:23:28Z

hey Aurélien, thank you for sharing this. I'll read that paper this week, it looks really helpful. It would be great to work on this problem with someone.

I've got a few changes on the dev branch in advance of doing coreference. I can talk through them if you'd like, but it should land as a release next week. Mostly changes to .nouns() responses, for weird noun-phrases. There's also an awkwardly named people().guessGender() 😬.

I'm also trying to build-up a tag for people referred to not by name, called #Actor - for things like 'the bartender ... he ..', or 'my grandma ... she'. Right now it's just a bunch of professions, mostly.

i like the sketchup for the api. Let me read that paper and release these fixes then I'll ping you next week.
cheers

spencermountain · 2023-02-04T19:48:27Z

okay, #Actor stuff is released in 14.8.2. Ready to start reproducing this paper, if you wanted.
The api right now is this:

doc.pronouns().forEach(p=>{
  p.refersTo().debug()
})

The logic lives here and the half-passing tests are here

Lots do to! You're welcome to try someting in a branch, or make a pr to dev or something. cheers

oaguy1 changed the title ~~Coreference Detection~~ Coreference Resolution Feb 10, 2019

spencermountain added yesss feature-request labels Feb 11, 2019

ProfJanetDavis mentioned this issue Aug 11, 2019

Some pronouns should be replaced with names to reduce ambiguity glam-lab/degender-the-web#64

Open

spencermountain pinned this issue Jan 26, 2023

spencermountain mentioned this issue Feb 4, 2023

Dev #995

Merged

spencermountain unpinned this issue Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coreference Resolution #565

Coreference Resolution #565

oaguy1 commented Feb 10, 2019

oaguy1 commented Feb 10, 2019

spencermountain commented Feb 11, 2019 •

edited

Loading

spencermountain commented Feb 11, 2019

oaguy1 commented Feb 12, 2019 •

edited

Loading

spencermountain commented Feb 12, 2019

oaguy1 commented Feb 22, 2019

oaguy1 commented Mar 7, 2019

spencermountain commented Mar 7, 2019

oaguy1 commented Mar 8, 2019

spencermountain commented Mar 8, 2019

oaguy1 commented Mar 8, 2019 via email

spencermountain commented Mar 11, 2019

oaguy1 commented Mar 11, 2019 via email

oaguy1 commented Mar 14, 2019 via email

spencermountain commented Mar 14, 2019

oaguy1 commented Mar 14, 2019 via email •

edited

Loading

spencermountain commented Mar 14, 2019

oaguy1 commented Mar 14, 2019 via email

au-re commented Jan 6, 2023

spencermountain commented Jan 6, 2023 •

edited

Loading

spencermountain commented Jan 6, 2023 •

edited

Loading

au-re commented Jan 26, 2023

spencermountain commented Jan 26, 2023 •

edited

Loading

spencermountain commented Feb 4, 2023

Coreference Resolution #565

Coreference Resolution #565

Comments

oaguy1 commented Feb 10, 2019

oaguy1 commented Feb 10, 2019

spencermountain commented Feb 11, 2019 • edited Loading

spencermountain commented Feb 11, 2019

oaguy1 commented Feb 12, 2019 • edited Loading

spencermountain commented Feb 12, 2019

oaguy1 commented Feb 22, 2019

oaguy1 commented Mar 7, 2019

spencermountain commented Mar 7, 2019

oaguy1 commented Mar 8, 2019

spencermountain commented Mar 8, 2019

oaguy1 commented Mar 8, 2019 via email

spencermountain commented Mar 11, 2019

oaguy1 commented Mar 11, 2019 via email

oaguy1 commented Mar 14, 2019 via email

spencermountain commented Mar 14, 2019

oaguy1 commented Mar 14, 2019 via email • edited Loading

spencermountain commented Mar 14, 2019

oaguy1 commented Mar 14, 2019 via email

au-re commented Jan 6, 2023

spencermountain commented Jan 6, 2023 • edited Loading

spencermountain commented Jan 6, 2023 • edited Loading

au-re commented Jan 26, 2023

spencermountain commented Jan 26, 2023 • edited Loading

spencermountain commented Feb 4, 2023

spencermountain commented Feb 11, 2019 •

edited

Loading

oaguy1 commented Feb 12, 2019 •

edited

Loading

oaguy1 commented Mar 14, 2019 via email •

edited

Loading

spencermountain commented Jan 6, 2023 •

edited

Loading

spencermountain commented Jan 6, 2023 •

edited

Loading

spencermountain commented Jan 26, 2023 •

edited

Loading