-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRL Annotator #473
base: master
Are you sure you want to change the base?
SRL Annotator #473
Conversation
@bhargav this sounds great. However, do you think it is possible to populate all the data in a single datamodel object efficiently? the reason that I defined a class for SRL datamodel was to be able to to have small graphs and then integrate them. |
@kordjamshidi Population works fine for small datasets (examples the test set has ~2000 sentences) but it takes much longer to populate large number of sentences. I did some profiling and most of the time during population of large graphs is spent in the textAnnotationToRelationMatch matching function. Trying to debug the issue with this behavior. |
Yes, right. I am sure the issue was related to establishing |
sentencesToRelations.addSensor(textAnnotationToRelationMatch _) | ||
|
||
// TODO - Verify if we really need this matching sensor. This is very inefficient. | ||
// sentencesToRelations.addSensor(textAnnotationToRelationMatch _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kordjamshidi Commenting out this matching sensor mitigates the data model population inefficiency. I have tried training models after commenting this out and training/testing works fine. Let me know if you have any concerns about this or any use-case that I didn't verify.
The root problem is still not fixed. The matching
logic iterates over all possible instantiations of the edges while adding a single instance. Source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhargav we populate the relations two times, one with generating sensor that populates relations in the very beginning when we populate the sentences. Later we add candidate relations from XuePalmer candidate generator and we added matching sensor to make sure the generated Xue palmer candidates are also connected to the right sentences. I am not sure why it should work correctly without the matching sensor, especially in the case that we use constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Let me have a look again. I ran aTr
, bTr
, cTr
and dTr
and the results were similar to the ones in the readme. Can check again.
Will investigate the matching
sensor issue.
I have trained and deployed models trained with PARSE_STANFORD. |
@bhargav thanks, I will review this, but I hoped we can find a better solution for the population with the matching sensor. Maybe in another PR then. |
Semaphore failed with the following exception related to MapDB. [error] Uncaught exception when running edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.ModelsTest: org.mapdb.DBException$DataCorruption: Header checksum broken. Store was not closed correctly, or is corrupted |
import edu.illinois.cs.cogcomp.saul.classifier.{ ConstrainedClassifier, Learnable } | ||
import edu.illinois.cs.cogcomp.saul.constraint.ConstraintTypeConversion._ | ||
import edu.illinois.cs.cogcomp.saul.datamodel.DataModel | ||
import edu.illinois.cs.cogcomp.saulexamples.nlp.CommonSensors._ | ||
import edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.SRLClassifiers.argumentTypeLearner | ||
import edu.illinois.cs.cogcomp.saulexamples.nlp.SemanticRoleLabeling.SRLSensors._ | ||
import org.scalatest.{ FlatSpec, Matchers } | ||
|
||
import scala.collection.JavaConverters._ | ||
import scala.collection.JavaConversions._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very important for now: we should try NOT to use JavaConversions
, and instead use JavaConverters
Thanks! Looks good to me! Merge call with @kordjamshidi |
Did any result change? @bhargav |
Reverting to the previous data models. Models trained with PARSE_STANFORD were couple of points lower than PARSE_GOLD on per-argument evaluation. But performance of the pipeline model using the
|
Annotator for the Semantic Role Labeler
Change the DataModel to be anobject
SRLConfigurator.java
Train models for Predicate Sense and populate Sense information in annotatorMove constants used in Annotator to someplace better(Dropping this)I'm dropping the requirement for training Predicate Sense for this PR. I can see if the illinois-verbsense package can be integrated as a separate PR.