ChemDataExtractor v2 is a toolkit for extracting chemical information from the scientific literature. Python 3.5 to Python 3.8 supported.
conda create -n cde2 python=3.8
conda activate cde2
pip install chemdataextractor2
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies
Please read the documentation for instructions on contributing to the project.
ChemDataExtractor v2 is licensed under the MIT license
_, a permissive, business-friendly license for open source
software.
MIT license: https://github.com/CambridgeMolecularEngineering/ChemDataExtractor/blob/master/LICENSE