Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

Merged
merged 8 commits into from
Dec 26, 2018
Merged

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

merged 8 commits into from
Dec 26, 2018

Conversation

bartolsthoorn
Copy link
Contributor

This PR adds a script and dataset that makes it easy to train for the band gap prediction with the OMDB-GAP1 dataset (organic crystal structures and their PBE band gap).

The paper of the dataset: https://arxiv.org/abs/1810.12814
I originally used the tensorflow SchNet implementation where I added my own xyz loading script, but I see that this new pytorch implementation is better, great work!

I am still training the model right now, will let you know how it performs. I expect to reach a MAE of 0.38 eV as I did with the old implementation.

Copy link

@nicoliKim nicoliKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Anyway I would expect that you can get below a MAE of 0.38eV with the new pytorch setup compared to the tensorflow one

ktschuett
ktschuett previously approved these changes Nov 15, 2018
@bartolsthoorn
Copy link
Contributor Author

Thank you! I would like to complete the training session (and check the performance on the test set) just to make sure everything is OK, maybe there are some issues later on in this script. After that, if everything is OK, it can be merged. I will post the results here as soon as I have them.

@bartolsthoorn
Copy link
Contributor Author

I trained it for 8 hours on a K80 GPU, with --interactions 3 --batch_size 32 --cutoff 4.0 --cuda:
https://gist.github.com/bartolsthoorn/30e75612b9a6faca3305783698187504

The validation loss / MAE / RMSE seems to flatten out too early, and the learning rate is not reduced even though perhaps it should? I will check why and try some other parameters for number of interaction blocks and such.

@ktschuett
Copy link
Contributor

The patience for reducing the learning rate is set to 50 epochs. Perhaps this is to much?

I used these settings recently, but this of course depends on your dataset:
lr_patience = 15
lr_decay = 0.8

@bartolsthoorn
Copy link
Contributor Author

bartolsthoorn commented Nov 23, 2018

The evaluation/test results are strangely off compared to the validation results. All the details:

python schnetpack_omdb.py train schnet OMDB-GAP1.tar.gz models/64_3 --interactions 3 --cutoff 4.0 --lr_patience 15 --lr_decay 0.8 --cuda

python schnetpack_omdb.py train schnet OMDB-GAP1.tar.gz models/64_6 --interactions 6 --cutoff 4.0 --lr_patience 15 --lr_decay 0.8 --cuda

Running for 24 hours, then killed by timeout on our cluster, but the best_model files have been saved anyway, so this is OK. The output: https://gist.github.com/bartolsthoorn/3cca1b366287ccefed5610801aad54e9

For evaluation:

python schnetpack_omdb.py eval schnet OMDB-GAP1.tar.gz models/64_3 --cuda

python schnetpack_omdb.py eval schnet OMDB-GAP1.tar.gz models/64_6 --cuda

The output:

# Subset,band_gap MAE,band_gap RMSE
test,0.68293,0.92913
# Subset,band_gap MAE,band_gap RMSE
test,0.62133,0.84027

So the test MAE is around 0.62 eV but it should be around 0.42 eV (judging from the validation MAE). This would still be worse than the result I reached with the tensorflow SchNet of 0.38 eV but at least quite close. Any ideas why this is happening?

(Table 2 lists the tensorflow SchNet results: https://arxiv.org/pdf/1810.12814.pdf)

@ktschuett
Copy link
Contributor

I have no idea, the models should be identical. Perhaps an unlucky split? Or you can try with a larger cutoff?

@ktschuett
Copy link
Contributor

Did you figure out what was wrong?

@bartolsthoorn
Copy link
Contributor Author

I did not have time to work on this last week unfortunately, but I will probably start some training sessions with different splits and different cutoffs today.

@bartolsthoorn
Copy link
Contributor Author

Yes setting the cut-off to 5.0 I am starting to get good test results 🎉 , Now I will just do some small further tuning to make the learning a bit faster, and then I will commit the best set of default parameters here.

@ktschuett
Copy link
Contributor

Anything new?

@bartolsthoorn
Copy link
Contributor Author

Now the PR is ready for merging. I made a small change to data.py to allow for dataset creation with only a split file. (I don't know what the "dismissed stale review" above is about, it happened automatically)

We will release an improved version 2 of the OMDB-GAP1 dataset soon and submit the update to arxiv and submit to a journal with the results obtained with SchNetPack. But anyway, this code is complete and ready.

@ktschuett ktschuett merged commit 9caa03c into atomistic-machine-learning:master Dec 26, 2018
@bartolsthoorn bartolsthoorn deleted the add-omdb-gap1-dataset branch January 3, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants