Improving the efficiency of the gradient calculation #1

henrycharlesworth · 2021-08-10T10:51:31Z

Hi! This is really interesting work and it's great you released the code like this. I just thought it would be worth mentioning - it seems like the way you calculate the gradients for different environments is a bit inefficient (basically using a for loop, right?) It might be worth checking out this: https://github.com/cybertronai/autograd-hacks#per-example-gradients (which I came to from a thread here: https://discuss.pytorch.org/t/how-to-efficiently-compute-gradient-for-each-training-sample/60001), which in theory should allow you to efficiently compute per example gradients.

neitzal · 2021-08-11T11:40:01Z

Hi Henry, Thanks a lot for your suggestion! You are right that the current way of computing example-wise gradients is unnecessarily inefficient. Using autograd-hacks could be a good workaround, but it looks like it currently only supports Linear and Conv2d-layers (but not, e.g., BatchNorm).
Another possibility would be to implement the AND-mask in JAX, where vmap makes it easy to compute example-wise gradients natively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the efficiency of the gradient calculation #1

Improving the efficiency of the gradient calculation #1

henrycharlesworth commented Aug 10, 2021

neitzal commented Aug 11, 2021

Improving the efficiency of the gradient calculation #1

Improving the efficiency of the gradient calculation #1

Comments

henrycharlesworth commented Aug 10, 2021

neitzal commented Aug 11, 2021