Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"invalid whitespace entity spans" error while validation training and test data for NER #13689

Open
abrarsharif66 opened this issue Nov 11, 2024 · 1 comment

Comments

@abrarsharif66
Copy link

How to reproduce the behaviour

I have use the following piece of code to convert json to spacy while validationg using spacy --debug i get whitespace error:

image

please help me how to resolve this

for text, annot in tqdm(TRAIN_DATA['annotations']):
doc = nlp.make_doc(text)
ents = []
for start, end, label in annot["entities"]:
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
db.to_disk("train_data.spacy")

Info about spaCy

  • spaCy version: 3.7.5
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Pipelines: en_core_web_lg (3.7.1), en_core_web_sm (3.7.1)
  • Operating System:
  • Python Version Used:
  • spaCy Version Used:
  • Environment Information:
@abrarsharif66
Copy link
Author

sample JSON file of my train data for better understanding of schema:

{"classes":["SOFTWARE_NAME","JOB_TYPE","EDUCATION","UNIVERSITY","DEGREE","YEARS_OF_EXPERIENCE","STATE","CITY","COUNTRY","PROGRAMING_CONCEPT","COMPANY_NAME","PROGRAMMING_LANGUAGE","FRAMEWORKS","SOFT_SKILLS","JOB_TITLE","NAME","EMAIL","PH.NO"],"annotations":[["Zixuan Wu zixwu@ucdavis.edu",{"entities":[[0,9,"NAME"],[10,27,"EMAIL"]]}],["1363 Briones Ct | Pleasanton, CA 94588 | (510) 676-7461",{"entities":[[41,55,"PH.NO"]]}]]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant