Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRL Annotator #473

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open

SRL Annotator #473

wants to merge 29 commits into from

Conversation

bhargav
Copy link
Contributor

@bhargav bhargav commented Apr 14, 2017

Annotator for the Semantic Role Labeler

  • Change the DataModel to be an object
  • Add Annotator class
  • Remove redundant SRLConfigurator.java
  • Train models for Predicate Sense and populate Sense information in annotator
  • Unit Test for Annotator
  • Move constants used in Annotator to someplace better (Dropping this)
  • Retrain and deploy all the models.

I'm dropping the requirement for training Predicate Sense for this PR. I can see if the illinois-verbsense package can be integrated as a separate PR.

@kordjamshidi
Copy link
Member

@bhargav this sounds great. However, do you think it is possible to populate all the data in a single datamodel object efficiently? the reason that I defined a class for SRL datamodel was to be able to to have small graphs and then integrate them.

@bhargav
Copy link
Contributor Author

bhargav commented Apr 17, 2017

@kordjamshidi Population works fine for small datasets (examples the test set has ~2000 sentences) but it takes much longer to populate large number of sentences.

I did some profiling and most of the time during population of large graphs is spent in the textAnnotationToRelationMatch matching function. Trying to debug the issue with this behavior.

@kordjamshidi
Copy link
Member

Yes, right. I am sure the issue was related to establishing matching edges. I hope you can see what is the actual issue.

@bhargav bhargav changed the title [WIP] SRL Annotator SRL Annotator May 2, 2017
sentencesToRelations.addSensor(textAnnotationToRelationMatch _)

// TODO - Verify if we really need this matching sensor. This is very inefficient.
// sentencesToRelations.addSensor(textAnnotationToRelationMatch _)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kordjamshidi Commenting out this matching sensor mitigates the data model population inefficiency. I have tried training models after commenting this out and training/testing works fine. Let me know if you have any concerns about this or any use-case that I didn't verify.

The root problem is still not fixed. The matching logic iterates over all possible instantiations of the edges while adding a single instance. Source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhargav we populate the relations two times, one with generating sensor that populates relations in the very beginning when we populate the sentences. Later we add candidate relations from XuePalmer candidate generator and we added matching sensor to make sure the generated Xue palmer candidates are also connected to the right sentences. I am not sure why it should work correctly without the matching sensor, especially in the case that we use constraints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Let me have a look again. I ran aTr, bTr, cTr and dTr and the results were similar to the ones in the readme. Can check again.

Will investigate the matching sensor issue.

@bhargav
Copy link
Contributor Author

bhargav commented May 2, 2017

I have trained and deployed models trained with PARSE_STANFORD.

@kordjamshidi
Copy link
Member

@bhargav thanks, I will review this, but I hoped we can find a better solution for the population with the matching sensor. Maybe in another PR then.

@bhargav
Copy link
Contributor Author

bhargav commented May 2, 2017

Semaphore failed with the following exception related to MapDB.

[error] Uncaught exception when running edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.ModelsTest: org.mapdb.DBException$DataCorruption: Header checksum broken. Store was not closed correctly, or is corrupted
sbt.ForkMain$ForkError: Header checksum broken. Store was not closed correctly, or is corrupted

import edu.illinois.cs.cogcomp.saul.classifier.{ ConstrainedClassifier, Learnable }
import edu.illinois.cs.cogcomp.saul.constraint.ConstraintTypeConversion._
import edu.illinois.cs.cogcomp.saul.datamodel.DataModel
import edu.illinois.cs.cogcomp.saulexamples.nlp.CommonSensors._
import edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.SRLClassifiers.argumentTypeLearner
import edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.SRLSensors._
import org.scalatest.{ FlatSpec, Matchers }

import scala.collection.JavaConverters._
import scala.collection.JavaConversions._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very important for now: we should try NOT to use JavaConversions, and instead use JavaConverters

@danyaljj
Copy link
Member

danyaljj commented May 3, 2017

Thanks! Looks good to me!

Merge call with @kordjamshidi

@kordjamshidi
Copy link
Member

Did any result change? @bhargav

@bhargav bhargav changed the title SRL Annotator [WIP] SRL Annotator May 3, 2017
@bhargav bhargav changed the title [WIP] SRL Annotator SRL Annotator May 5, 2017
@bhargav
Copy link
Contributor Author

bhargav commented May 10, 2017

Reverting to the previous data models. Models trained with PARSE_STANFORD were couple of points lower than PARSE_GOLD on per-argument evaluation. But performance of the pipeline model using the PredicateArgumentEvaluator leads to similar results.

Model Precision Recall F1
VerbSRL (PARSE_GOLD) 65.56 64.08 64.81
Verb SRL (PARSE_STANFORD) 65.52 63.48 64.48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants