You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for this amazing work. i have some questions.
The annotation is base on abstract level. but when you use PubMedBERT model for relation extraction, how do tokenizers do the sentence segmentation? As i know max token of BERT is 512. So how do you proceed if the token length of one abstract bigger than 512?
Another question is when you do annotation, how about the coreference examples? Did you also annotate pronoun like, 'it', 'this' also as entity? do they become noises for NER task? Before do RE task, do you change them as original entity names or keep them or any other strategies?
The text was updated successfully, but these errors were encountered:
Hi @Meiling-Sun,
We don't deal with the token length of one abstract larger than 512 in the PubMedBERT model. If you would like to do this, you may consider to use the "stride" parameter of huggingface's tokenizer.
No, our BioRED corpus doesn't contain pronoun annotations, so they are not used in NER and RE. In our dataset, coreference cases are those entities which have the same database identifier, e.g. MESH or Entrez ID. For the RE task, I don't normalize the entities in text, instead I inserts special tokens to tag those entities in the text.
Hi, thanks for this amazing work. i have some questions.
The annotation is base on abstract level. but when you use PubMedBERT model for relation extraction, how do tokenizers do the sentence segmentation? As i know max token of BERT is 512. So how do you proceed if the token length of one abstract bigger than 512?
Another question is when you do annotation, how about the coreference examples? Did you also annotate pronoun like, 'it', 'this' also as entity? do they become noises for NER task? Before do RE task, do you change them as original entity names or keep them or any other strategies?
The text was updated successfully, but these errors were encountered: