-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFConv layer for SchNet #18
Conversation
This is ready to review. I posted benchmark results at #14 (comment). |
* @param w2 an array of shape [width][width] containing the weights of the second dense layer | ||
* @param b2 an array of length [width] containing the biases of the second dense layer | ||
*/ | ||
CFConv(int numAtoms, int width, int numGaussians, float cutoff, bool periodic, float gaussianWidth) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w1
, b1
, etc. is described above, but they are missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I should remove those. The subclass constructors take those arguments, but they don't pass them on to the parent class.
I've written the CPU implementation so far. The calculation matches the behavior of SchNetPack, which is different from what's described in the paper. It's less flexible than SchNetPack though. For example, it always uses a cutoff, and always uses a cosine cutoff function, where SchNetPack lets you choose an arbitrary cutoff function.
The way I've implemented the derivatives is less than ideal. The standard method for doing backpropagation assumes you've saved intermediate quantities during the forward pass. But I don't want to save any per-interaction quantities, so instead I have to repeat nearly all the calculations from the forward pass in the backward pass. This slows it down a bit, but I think it will still end up being much faster on the GPU than if we save and reload per-interaction data. Once the CUDA implementation is written we can investigate that assumption and see if it's true.