CDE_MSE_Databases

Premise

Accurate and comprehensive material databases extracted from research papers are critical for materials science and engineering but require significant human effort to develop. In this paper we present a simple method of extracting materials data from full texts of research papers suitable for quickly developing modest-sized databases. The method requires minimal to no coding, prior knowledge about the extracted property, or model training, and provides high recall and almost perfect precision in the resultant database. The method is fully automated except for one human-assisted step, which typically requires just a few hours of human labor. The method builds on top of natural language processing and large general language models but can work with almost any such model. The language models GPT-3/3.5, bart and DeBERTaV3 are evaluated here for comparison. We provide a detailed detailed analysis of the methods performance in extracting bulk modulus data, obtaining up to 90% precision at 96% recall, depending on the amount of human effort involved. We then demonstrate the methods broader effectiveness by developing a database of critical cooling rates for metallic glasses.

Find paper at https://arxiv.org/abs/2302.04914

How to Run

1

Git pull PaperDownload Directory

2

Run get_papers_1.py to get xml papers, change directory within py file

3

Run process_xml_2.sh and process_xml_3.sh

4

Run prepare_property.sh for completion

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
PaperDownload		PaperDownload
Semesters		Semesters
alternate_versions		alternate_versions
jupyter_files		jupyter_files
open this for text file manipulation		open this for text file manipulation
scibert_preprocess_out		scibert_preprocess_out
.gitignore		.gitignore
LICENSE		LICENSE
NLP.zip		NLP.zip
README.md		README.md
scibert_preprocess.zip		scibert_preprocess.zip
sibert_preprocess.tar.gz		sibert_preprocess.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDE_MSE_Databases

Premise

How to Run

1

2

3

4

About

Releases

Packages

Contributors 2

Languages

License

ayandeephazra/Natural_Language_Processing_Research

Folders and files

Latest commit

History

Repository files navigation

CDE_MSE_Databases

Premise

How to Run

1

2

3

4

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages