Skip to content

wjbmattingly/biospacy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bispacy logo

BioSpaCy is a spacy pipeline for processing biology texts. Currently, the pipeline uses rulers and heuristics to identify:

  • DOMAIN
  • KINGDOM
  • PHYLUM
    • PHYLUM-ANIMALIA
    • PHYLUM-BACTERIA
    • PHYLUM-FUNGI
    • PHYLUM-PLANTS
    • PHYLUM-PROTISTA
  • CLASS (NOT INCLUDED YET)
  • FAMILY (Plants only)
  • SUBFAMILY (Plants only)
  • ORDER (Plants only)
  • GENUS (Plants only)
  • SPECIES (Plants only)
  • BINOMINA (Plants only)

Installation

pip install en_biospacy

Usage

import spacy
from spacy import displacy
text = """
Nephrolepis exaltata, known as the sword fern[1] or Boston fern,
is a species of fern in the family Lomariopsidaceae
(sometimes treated in the families Davalliaceae or Oleandraceae,
or in its own family, Nephrolepidaceae).
"""
nlp = spacy.load("en_biospacy")
doc = nlp(text)
for span in doc.spans["ruler"]:
    print(span.text, span.label_)

Expected Output

Nephrolepis GENUS
exaltata SPECIES
Lomariopsidaceae FAMILY
Davalliaceae FAMILY
Oleandraceae FAMILY
Nephrolepis exaltata BINOMINA

Data for domains, kingdoms, phyla came from Wikipedia. Data for plant family, subfamily, order, genus, and species came from The World of Flora Online

Citation to Data

"WFO (2022): World Flora Online. Version 2022.07. Published on the Internet; http://www.worldfloraonline.org. Accessed on: 1 January 2023".

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages