Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

bartolsthoorn · 2018-11-15T10:32:00Z

This PR adds a script and dataset that makes it easy to train for the band gap prediction with the OMDB-GAP1 dataset (organic crystal structures and their PBE band gap).

The paper of the dataset: https://arxiv.org/abs/1810.12814
I originally used the tensorflow SchNet implementation where I added my own xyz loading script, but I see that this new pytorch implementation is better, great work!

I am still training the model right now, will let you know how it performs. I expect to reach a MAE of 0.38 eV as I did with the old implementation.

nicoliKim

Great work!
Anyway I would expect that you can get below a MAE of 0.38eV with the new pytorch setup compared to the tensorflow one

bartolsthoorn · 2018-11-15T16:45:25Z

Thank you! I would like to complete the training session (and check the performance on the test set) just to make sure everything is OK, maybe there are some issues later on in this script. After that, if everything is OK, it can be merged. I will post the results here as soon as I have them.

bartolsthoorn · 2018-11-20T07:11:47Z

I trained it for 8 hours on a K80 GPU, with --interactions 3 --batch_size 32 --cutoff 4.0 --cuda:
https://gist.github.com/bartolsthoorn/30e75612b9a6faca3305783698187504

The validation loss / MAE / RMSE seems to flatten out too early, and the learning rate is not reduced even though perhaps it should? I will check why and try some other parameters for number of interaction blocks and such.

ktschuett · 2018-11-20T07:46:50Z

The patience for reducing the learning rate is set to 50 epochs. Perhaps this is to much?

I used these settings recently, but this of course depends on your dataset:
lr_patience = 15
lr_decay = 0.8

bartolsthoorn · 2018-11-23T08:56:21Z

The evaluation/test results are strangely off compared to the validation results. All the details:

python schnetpack_omdb.py train schnet OMDB-GAP1.tar.gz models/64_3 --interactions 3 --cutoff 4.0 --lr_patience 15 --lr_decay 0.8 --cuda

python schnetpack_omdb.py train schnet OMDB-GAP1.tar.gz models/64_6 --interactions 6 --cutoff 4.0 --lr_patience 15 --lr_decay 0.8 --cuda

Running for 24 hours, then killed by timeout on our cluster, but the best_model files have been saved anyway, so this is OK. The output: https://gist.github.com/bartolsthoorn/3cca1b366287ccefed5610801aad54e9

For evaluation:

python schnetpack_omdb.py eval schnet OMDB-GAP1.tar.gz models/64_3 --cuda

python schnetpack_omdb.py eval schnet OMDB-GAP1.tar.gz models/64_6 --cuda

The output:

# Subset,band_gap MAE,band_gap RMSE
test,0.68293,0.92913

# Subset,band_gap MAE,band_gap RMSE
test,0.62133,0.84027

So the test MAE is around 0.62 eV but it should be around 0.42 eV (judging from the validation MAE). This would still be worse than the result I reached with the tensorflow SchNet of 0.38 eV but at least quite close. Any ideas why this is happening?

(Table 2 lists the tensorflow SchNet results: https://arxiv.org/pdf/1810.12814.pdf)

ktschuett · 2018-11-23T09:32:11Z

I have no idea, the models should be identical. Perhaps an unlucky split? Or you can try with a larger cutoff?

ktschuett · 2018-11-30T08:18:16Z

Did you figure out what was wrong?

bartolsthoorn · 2018-12-02T08:11:06Z

I did not have time to work on this last week unfortunately, but I will probably start some training sessions with different splits and different cutoffs today.

bartolsthoorn · 2018-12-07T16:54:41Z

Yes setting the cut-off to 5.0 I am starting to get good test results 🎉 , Now I will just do some small further tuning to make the learning a bit faster, and then I will commit the best set of default parameters here.

ktschuett · 2018-12-19T10:10:26Z

Anything new?

bartolsthoorn · 2018-12-22T16:41:23Z

Now the PR is ready for merging. I made a small change to data.py to allow for dataset creation with only a split file. (I don't know what the "dismissed stale review" above is about, it happened automatically)

We will release an improved version 2 of the OMDB-GAP1 dataset soon and submit the update to arxiv and submit to a journal with the results obtained with SchNetPack. But anyway, this code is complete and ready.

bartolsthoorn added 2 commits November 15, 2018 11:21

Add Organic Materials Database (OMDB-GAP1) dataset

d89b97f

Add arXiv link for Organic Materials Database dataset

efa844f

nicoliKim reviewed Nov 15, 2018

View reviewed changes

ktschuett previously approved these changes Nov 15, 2018

View reviewed changes

Merge branch 'master' into add-omdb-gap1-dataset

53d310e

Merge branch 'master' into add-omdb-gap1-dataset

81d77f6

Merge branch 'master' into add-omdb-gap1-dataset

6aa50bb

ktschuett and others added 3 commits December 19, 2018 11:10

Merge branch 'master' into add-omdb-gap1-dataset

881d41c

Allow for dataset with split file only

aa5b63e

Update default settings of schnetpack_omdb script

08841a4

bartolsthoorn dismissed ktschuett’s stale review via 08841a4 December 22, 2018 16:26

ktschuett approved these changes Dec 26, 2018

View reviewed changes

ktschuett merged commit 9caa03c into atomistic-machine-learning:master Dec 26, 2018

bartolsthoorn deleted the add-omdb-gap1-dataset branch January 3, 2019 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

bartolsthoorn commented Nov 15, 2018

nicoliKim left a comment

bartolsthoorn commented Nov 15, 2018

bartolsthoorn commented Nov 20, 2018

ktschuett commented Nov 20, 2018

bartolsthoorn commented Nov 23, 2018 •

edited

Loading

ktschuett commented Nov 23, 2018

ktschuett commented Nov 30, 2018

bartolsthoorn commented Dec 2, 2018

bartolsthoorn commented Dec 7, 2018

ktschuett commented Dec 19, 2018

bartolsthoorn commented Dec 22, 2018

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

Add Organic Materials Database band gap dataset (OMDB-GAP1) #26

Conversation

bartolsthoorn commented Nov 15, 2018

nicoliKim left a comment

Choose a reason for hiding this comment

bartolsthoorn commented Nov 15, 2018

bartolsthoorn commented Nov 20, 2018

ktschuett commented Nov 20, 2018

bartolsthoorn commented Nov 23, 2018 • edited Loading

ktschuett commented Nov 23, 2018

ktschuett commented Nov 30, 2018

bartolsthoorn commented Dec 2, 2018

bartolsthoorn commented Dec 7, 2018

ktschuett commented Dec 19, 2018

bartolsthoorn commented Dec 22, 2018

bartolsthoorn commented Nov 23, 2018 •

edited

Loading