-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coreference Resolution #565
Comments
A simpler explanation and sample implementation of the algorithm I mentioned can be found here. |
YESSSSS something like this? let doc=nlp('Carrots are orange. They are delicious.')
doc.pronouns().data()
// [{text:'they', normal:'they', reference:'carrots'}]
doc.nouns().data()
//[{text:'carrots', normal:'carrots', references:['they']}] something like that? |
it may be desirable too, to actually fetch the reference word(s), so that people can do whatever they want to the results, like replace them or something. The only tricky-part i can imagine is tracking-down the reference word(s), and packing them into a Text object, so that a person can do |
Glad you are excited! The algorithm I linked to can track down the the references of the pronouns in a manner that is right most of the time (80%). The way I was thinking about approaching this was adding an additional tagging step where we looked at each of the pronouns and then use Hobbs’ algorithm to find the best guess at the antecedent. With that in mind, my initial plan for the API was something like this: // grabbing the antecendt to a pronoun
doc.match(“#Pronoun”).get(0).antecedent();
// grabbing the pronouns for person
doc.people().get(0).pronouns(); I think once we have the additional API built out for Terms, something closer to what you initially suggested on the more macro/document level. Let me know what you think! I plan on sitting down and putting some more time on this tomorrow. |
yeah cool! doc.match(“#Pronoun”).antecedents(0);
doc.people().pronouns(0); either way, happy to see this in-action, then we can shove it around after. been thinking past few weeks about breaking-up compromise into more micro-libraries, like d3 did. If we end up doing that, this work will end-up in a named-entity-plugin, or something like that |
Sounds good! I definitely want to try to keep the API as close to the rest of the library as possible. I am hacking on this when I have time, but still won't have much to share for a while. Once I have a good working MVP with tests I will make a PR and we can really play with it. |
@spencermountain As part of the algorithm I am implementing, I am trying to start from an individual Term (an instance of a pronoun) and then move to the previous Term in the sentence to see if it matches some criteria. Once the beginning of the sentence is reached, the "previous" term would the last Term in the previous sentence. It would also be good to know if the previous term came from another sentence or paragraph. Is there support for such movement within the text currently in the lib? If not, where would be a good place to start for adding it? |
hey David, yeah you may want to just use the internal arrays of sentences, and terms. let doc = nlp(myText)
doc.list //arrays of sentences
doc.list[0].terms // terms in each sentence we don't have any support for paragraphs (right now) |
That is helpful, thank you so much for responding so quickly. I was thinking, if you only have sentences of terms, how do you feel about adding some sort of index to the Term objects, so they are aware of their position within the document? I could add this during the build process, an attribute named something like Let me know what you think, I don't want to be too crazy adding things w/o checking in. |
hey David, yeah this has been the hard-part of making compromise, that 'position within the document' changes considerably, and depends on where the user is zooming-in, cloning, etc. I've started working on a major re-write, for v12, that you may be interested in, over here. It uses a linked-list model, so references, and indexes are more 'postmodern', and don't suffer any of the awkwardness you're going through. I'm also concerned that adding in co-reference resolution to v11 may be more complicated than it would be in v12. It's not very solid yet, and still moving-around in some circles.. How would you feel about me creating a That would give us an opportunity to implement that Hobbs paper, without worrying about api changes:
how's that? |
That would be a great stop-gap between APIs and a good place to get the
algorithm down before (possibly) adding it to the full lib. If it would be
easier to implement the paper in the new API, I am happy to lend a hand to
speed that along, too.
Thank you for all the help so far! It is nice to have a positive
contribution experience on such a cool project.
…On Fri, Mar 8, 2019 at 09:42 spencer kelly ***@***.***> wrote:
hey David, yeah this has been the hard-part of making compromise, that
'position within the document' changes considerably, and depends on where
the user is zooming-in, cloning, etc.
I've started working on a major re-write, for v12, that you may be
interested in, over here
<https://github.com/spencermountain/compromise/tree/linked-list>. It uses
a linked-list model, so references, and indexes are more 'postmodern', and
don't suffer any of the awkwardness you're going through.
I'm also concerned that adding in co-reference resolution to v11 may be
more complicated than it would be in v12. It's not very solid yet, and
still moving-around in some circles..
How would you feel about me creating a compromise-coreference repo, and
us working on it there?
That would give us an opportunity to implement that Hobbs paper, without
worrying about api changes:
const nlp=require('compromise')
const ccr=require('compromise-coreference')
let doc=nlp(myText)
let json = ccr(doc)
/* {whatever json-structure you'd like} */
how's that?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#565 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPxnul48f8rfEyU-vJIUbKrQuySFv4Qks5vUnbLgaJpZM4ay2cL>
.
|
hey, i've added you to a basic version of this here. |
Awesome, thanks!
…On Mon, Mar 11, 2019 at 18:23 spencer kelly ***@***.***> wrote:
hey, i've added you to a basic version of this here
<https://github.com/nlp-compromise/compromise-coreference>.
take it for a ride - feel free to commit directly to it, it's pretty-rough!
cheers
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#565 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPxniRtoEzMAe7XEzaJAuWq-PrhQspYks5vVtdjgaJpZM4ay2cL>
.
|
Also, have you considered a tree structure over a linked list? That seems
to be the data structure that many nlp libs use and makes adding additional
depths (paragraphs, etc) and exact document position more doable.
On Mon, Mar 11, 2019 at 18:25 David Hughes-Robinson <oaguy1@gmail.com>
wrote:
… Awesome, thanks!
On Mon, Mar 11, 2019 at 18:23 spencer kelly ***@***.***>
wrote:
> hey, i've added you to a basic version of this here
> <https://github.com/nlp-compromise/compromise-coreference>.
> take it for a ride - feel free to commit directly to it, it's
> pretty-rough!
> cheers
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#565 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ACPxniRtoEzMAe7XEzaJAuWq-PrhQspYks5vVtdjgaJpZM4ay2cL>
> .
>
|
i'd love to hear more about this idea, how do you imagine it working? |
Would love to discuss. I created a slack channel so we can go back and
forth without polluting this issue.
https://compromisenlp.slack.com/
|
wanna just join the existing slack group? |
Yes! I will delete the group I created (should have searched first)
…On Thu, Mar 14, 2019 at 10:08 spencer kelly ***@***.***> wrote:
wanna just join the existing slack group
<https://slackin-kyzvclgjlg.now.sh/>?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#565 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPxnhBfuEgnCudHv23UrBLpvbKkgXVpks5vWlflgaJpZM4ay2cL>
.
|
hi! sorry to comment on an old issue, but I was wondering if coreference resolution eventually did become part of compromise? |
hey Aurélien - on my new-years resolutions this year. There's actually an undocumented api for it here - i wouldn't recommend using it yet though. will update this issue when it lands. Would love some help. |
if you, (or anybody) was interested in working on it, the current implementation is here it's a pretty-tricky problem. current version looks back 2 sentences for a 'he' or 'she'. i think i started to try 'they' and got overwhelmed. 'it' is most-likely the hardest. |
I've been reading a bit about the topic, turns out co-reference resolution is a whole field of research 😅 I found a paper describing a nice rule based algorithm that might be a good starting point https://aclanthology.org/J13-4004.pdf It describes a series of sieves that are applied until all mentions in a text refer to some entity. Maybe it could work something like this: const text = "John is a musician. He played a new song. A girl was listening to the song. 'It is my favorite', John said to her."
nlp(text).coreference().json()
[
{ terms: [...], text: "John", coreference: { refs: [1] } },
{ terms: [...], text: "he", coreference: { refs: [1] } },
{ terms: [...], text: "a new song", coreference: { refs: [2] } },
{ terms: [...], text: "It", coreference: { refs: [2] } },
{ terms: [...], text: "A girl", coreference:{ refs: [3] } },
{ terms: [...], text: "the song", coreference: { refs: [2] } },
{ terms: [...], text: "my", coreference:{ refs: [1] },
{ terms: [...], text: "her", coreference: { refs: [3] } },
] Keeping an array of references might be useful for cases where one word might refer to several entities (e.g. "they") Here are some of the sieves described in the paper:
For each mention we then try to find a matching antecedent by running it through every sieve, a sieve either resolves the match or leaves it for a later sieve. Some additional methods might be useful to build the sieves: nlp(text).mentions().json()
// [{ terms: [...], text: "John" }, { terms: [...], text: "It" }, { terms: [...], text: "A girl" }, { terms: [...], text: "my" }, ...]
nlp(text).speakers().json()
// [{ terms: [...], text: "John", speaker: { quote: "It is my favorite" } }] |
hey Aurélien, thank you for sharing this. I'll read that paper this week, it looks really helpful. It would be great to work on this problem with someone. I've got a few changes on the dev branch in advance of doing coreference. I can talk through them if you'd like, but it should land as a release next week. Mostly changes to .nouns() responses, for weird noun-phrases. There's also an awkwardly named I'm also trying to build-up a tag for people referred to not by name, called i like the sketchup for the api. Let me read that paper and release these fixes then I'll ping you next week. |
okay, doc.pronouns().forEach(p=>{
p.refersTo().debug()
}) The logic lives here and the half-passing tests are here Lots do to! You're welcome to try someting in a branch, or make a pr to dev or something. cheers |
Hello! I am looking into using coreference resolution in a project I am working. There exist a reasonably easy (read: does not require a neural network and training) algorithm to do just this and I was thinking of adding it to this library. I read the contributing guide and wanted to make an issue to test the water before spending a lot of time working on this.
The text was updated successfully, but these errors were encountered: