Entropy-Targeted Active Learning

This repository contains an implementation of entropy-targeted active learning (ET-AL) for materials data bias mitigation, associated with our paper.

Copyright

This code is open-sourced under the MIT license. Feel free to use all or portions for your research or related projects so long as you provide the following citation information:

Hengrui Zhang, Wei (Wayne) Chen, James M. Rondinelli, and Wei Chen, ET-AL: Entropy-targeted active learning for bias mitigation in materials data, Applied Physics Reviews 10, 021403 (2023).

@article{zhang2023etal,
    author = {Zhang, Hengrui and Chen, Wei Wayne and Rondinelli, James M. and Chen, Wei},
    title = {ET-AL: entropy-targeted active learning for bias mitigation in materials data},
    journal = {Applied Physics Reviews},
    volume = {10},
    number = {2},
    pages = {021403},
    year = {2023},
    doi = {10.1063/5.0138913},
    url = {https://doi.org/10.1063/5.0138913}
}

Descriptions

etal_main.py implements the ET-AL algorithm and demonstrates on the Jarvis-CFID dataset.

ML_comparison.ipynb compares several ML models on different training sets.

plot_data.ipynb is used for creating relevant plots for visualization.

datasets/ provides data required for reproducing the results in our paper.

results/ contains data generated in ET-AL demonstration on the Jarvis-CFID dataset

utils/ contains tools for data pre-processing:

Jarvis_data.ipynb is used for retrieving, cleaning the Jarvis CFID data and generating graph embeddings.
Jarvis_featurize.ipynb generates physical descriptors for the Jarvis CFID data.
compound_featurizer.py automatic tool for physical descriptors
cgcnn/ the CGCNN model for graph embeddings

Usage

Set up environment

Navigate to the code directory and create the environment:

conda env create -f environment.yml

Then activate the new environment:

conda activate gp-torch

Data preparation

Organize the dataset in a Data Frame and change the data paths in etal_main.py. For demonstration purposes, a dataset derived from the Jarvis CFID data is provided in datasets/: the crystal structures and properties are in data_cleaned.pkl, and the graph embeddings are in cgcnn_embeddings.pkl.

*Note: Git LFS is required for data_cleaned.pkl to be downloaded properly. Please download the file manually if you do not have Git LFS.

Run code

Set up experimental parameters in etal_main.py: n_iter for maximum number of ET-AL iterations, n_test for number of data points left as test set, n_unlabeled for number of data points left as unlabeled. Edit the following part to change the selection of unlabeled data.
Run ET-AL model:

python etal_main.py

Run ML_comparison to compare ML models on training set generated by ET-AL sampling and random sampling.
Use plot_data to visualize the results and reproduce plots in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
datasets		datasets
results		results
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
ML_comparison.ipynb		ML_comparison.ipynb
README.md		README.md
environment.yml		environment.yml
etal_algorithm.webp		etal_algorithm.webp
etal_main.py		etal_main.py
plot_data.ipynb		plot_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entropy-Targeted Active Learning

Copyright

Descriptions

Usage

Set up environment

Data preparation

Run code

About

Releases

Packages

Languages

License

Henrium/ET-AL

Folders and files

Latest commit

History

Repository files navigation

Entropy-Targeted Active Learning

Copyright

Descriptions

Usage

Set up environment

Data preparation

Run code

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages