Previously this project was released as grobid-superconductors-tools
, born as aister project of grobid-superconductors containing a web service that interfaces with the python libraries (e.g. Spacy).
The service provides the following functionalities:
- Convert material name to formula (e.g. Lead -> Pb, Hydrogen -> H):
/convert/name/formula
- Decompose formula into structured dict of elements (e.g. La x Fe 1-x O7-> {La: x, Fe: 1-x, O: 7}):
/convert/formula/composition
- Classify material in classes (from the superconductors domain) using a rule-base table (e.g. "La Cu Fe" -> Cuprates):
/classify/formula
- Tc's classification (Tc, not-Tc):
/classify/tc
for information please open an issue - Relation extraction given a sentence and two entities:
/process/link
for information please open an issue - Material processing using Deep Learning models and rule-based processing
/process/material
The service is deployed on huggingface spaces, and can be used right away. For installing the service in your own environment see below.
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/name/formula' \
--form 'input="Hydrogen"'
output:
{"composition": {"H": "1"}, "name": "Hydrogen", "formula": "H"}
Example:
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/formula/composition' \
--form 'input="CaBr2-x"'
output:
{"composition": {"Ca": "1", "Br": "2-x"}}
Example:
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/classify/formula' \
--form 'input="(Mo 0.96 Zr 0.04 ) 0.85 B x "'
output:
['Alloys']
This process includes a combination of everything listed above, after passing the material sequence through a DL model
Example:
curl --location 'https://lfoppiano-material-parsers.hf.space/process/material' \
--form 'text="(Mo 0.96 Zr 0.04 ) 0.85 B x "'
output:
[
{
"formula": {
"rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x"
},
"resolvedFormulas": [
{
"rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x",
"formulaComposition": {
"Mo": "0.816",
"Zr": "0.034",
"B": "x"
}
}
]
}
]
The model uses DeLFT's model BidLSTM_CRF.
Evaluated on the 23/12/25
precision recall f1-score support
<doping> 0.6926 0.6377 0.6640 265
<fabrication> 0.3333 0.0909 0.1429 44
<formula> 0.8348 0.8459 0.8403 2569
<name> 0.7346 0.7935 0.7629 949
<shape> 0.9089 0.9608 0.9341 841
<substrate> 0.5875 0.3176 0.4123 148
<value> 0.8844 0.8920 0.8882 463
<variable> 0.9645 0.9710 0.9677 448
all (micro avg.) 0.8321 0.8385 0.8353 5727
docker run -it lfoppiano/grobid-superconductors-tools:2.1
If you use our work, and write about it, please cite our paper:
@article{doi:10.1080/27660400.2022.2153633,
author = {Luca Foppiano and Pedro Baptista Castro and Pedro Ortiz Suarez and Kensei Terashima and Yoshihiko Takano and Masashi Ishii},
title = {Automatic extraction of materials and properties from superconductors scientific literature},
journal = {Science and Technology of Advanced Materials: Methods},
volume = {3},
number = {1},
pages = {2153633},
year = {2023},
publisher = {Taylor & Francis},
doi = {10.1080/27660400.2022.2153633},
URL = {
https://doi.org/10.1080/27660400.2022.2153633
},
eprint = {
https://doi.org/10.1080/27660400.2022.2153633
}
}
- Converters TSV to/from Grobid XML files conversion
- Linking module: A rule based python algorithm to link entities
- Commons libraries: contains common code shared between the various component. The Grobid client was borrowed from here, the tokenizer from there.
conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt
conda install scikit-learn=1.0.1
We need to remove tensorflow, h5py, scikit-learn from the delft dependencies in setup.py
pip install -e ../../delft
pip install -r requirements.txt
Finally, don't forget to install the spacy model
python -m spacy download en_core_web_sm
bump-my-version bump patch|minor|major