-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the parameter-shift hessian to the Torch interface #1129
Conversation
Hello. You may have forgotten to update the changelog!
|
if self.output_dim is not None: | ||
hessian = np.zeros( | ||
(len(params), len(params), np.prod(self.output_dim)), dtype=float | ||
) | ||
else: | ||
hessian = np.zeros((len(params), len(params)), dtype=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small modification to allow Hessians to be computed for vector-valued QNodes
Codecov Report
@@ Coverage Diff @@
## master #1129 +/- ##
=======================================
Coverage 98.11% 98.12%
=======================================
Files 144 144
Lines 10794 10813 +19
=======================================
+ Hits 10591 10610 +19
Misses 203 203
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146! I have just left some questions mainly for my understanding, once clarified I'll be happy to approve.
>>> jacobian(circuit, x) | ||
tensor([[ 0.1751, -0.2456], | ||
[-0.1751, 0.2456]], grad_fn=<ViewBackward>) | ||
>>> hessian(circuit, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool!
So when it's the hessian of a vector output, are we just lining up the individual Hessian matrices? E.g. like here. I guess this is pretty standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! With a disclaimer that I'm not entirely sure how pytorch is ordering the dimensions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output below agrees with TF and Autograd, but should double check this
@@ -2,6 +2,39 @@ | |||
|
|||
<h3>New features since last release</h3> | |||
|
|||
* Computing second derivatives and Hessians of QNodes is now supported when | |||
using the PyTorch interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for my understanding, we'd also like to add for other interfaces, but Torch is the first one we're doing (maybe because it's easier)?
So this is a continuation of #961?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pennylane/tape/interfaces/torch.py
Outdated
* Takes advantage of closure, to cache computed gradient matrices via | ||
the ctx.saved_grad_matrices attribute, to avoid gradient matrices being | ||
computed multiple redundant times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
pennylane/tape/interfaces/torch.py
Outdated
and *not* the full GradMatrix, differentiating vector-valued | ||
functions will result in multiple backward passes. | ||
""" | ||
if grad_matrix_fn in ctx.saved_grad_matrices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we make saved_grad_matrices
an attribute of ctx
rather than standalone? Is ctx.saved_grad_matrices
used somewhere behind the scenes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, for some reason without using the context PyTorch doesn't 'see' the saved_grad_matrices
closure variable 🤔 I chalked it up to PyTorch doing something funky with the class behind the scenes
pennylane/tape/interfaces/torch.py
Outdated
return jacobian | ||
|
||
@staticmethod | ||
def backward(ctx_, ddy): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we turn off coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though we have gradient tests for the torch interface, for some reason pycoverage can't see that the backward()
method is called 🤔 So it will show up as not covered otherwise.
@@ -527,6 +528,149 @@ def circuit(): | |||
assert isinstance(res[0], torch.Tensor) | |||
assert isinstance(res[1], torch.Tensor) | |||
|
|||
def test_hessian(self, dev_name, diff_method, mocker, tol): | |||
"""Test hessian calculation of a scalar valued QNode""" | |||
if diff_method not in {"parameter-shift", "backprop"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if diff_method="backprop"
it still calculates the hessian using param-shift? Or it just uses a backprop hessian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The torch interface doesn't yet support backprop, but this should be possible soon (since Torch1.8 supports differentiating complex matrix multiplication, see #929).
Note that this class is not parametrized over diff_method="backprop"
, so that case never occurs here. I left in the if
statement just as a reminder that backprop
should also be tested once #929 is merged.
Do you think we should delete this if statement for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so in this test we'll currently not see diff_method="backprop"
, but hopefully will soon? Kind of feels like the case where we should set up a separate test fail using "backprop"
and then mark that as a strict xfail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the xfail flag inside an if statement?
expected_hess = [ | ||
[ | ||
[-np.cos(a) * np.cos(b), np.sin(a) * np.sin(b)], | ||
[np.sin(a) * np.sin(b), -np.cos(a) * np.cos(b)] | ||
], | ||
[ | ||
[-0.5 * np.cos(a) * np.cos(b), 0.5 * np.sin(a) * np.sin(b)], | ||
[0.5 * np.sin(a) * np.sin(b), -0.5 * np.cos(a) * np.cos(b)] | ||
], | ||
[ | ||
[0.5 * np.cos(a) * np.cos(b), -0.5 * np.sin(a) * np.sin(b)], | ||
[-0.5 * np.sin(a) * np.sin(b), 0.5 * np.cos(a) * np.cos(b)] | ||
] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the qnode is ragged, but the Hessian is a more standard (3, 2, 2)
shape? Is this what PyTorch would expect?
E.g., not something like [hessian1, hessian2]
with hessian1
of shape (2, 2)
and hessian2
of shape (2, 2, 2)
?
I don't mind either way, just wondering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point.
Currently, if the QNode returns 'ragged' output (e.g., expval + probs, or multiple probs on different wires), we are flattening the output due to a very ancient restriction we had that QNodes always return NumPy arrays. This restriction was made back when we only had an Autograd interface to support, and no probs
.
However, it's probably time we updated this:
- If the QNode
return
statement returns a tuple, the user should get a tuple as output, not a NumPy array. - This avoids the whole issue of ragged arrays altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be a cool thing to sort out now-ish, before usage of return expval, probs
etc. becomes entrenched (at the moment, it's quite rare)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got curious, and coded this up.
TensorFlow
import pennylane as qml
import tensorflow as tf
dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev, interface="tf")
def circuit(x):
qml.RX(x, wires=0)
return qml.expval(qml.PauliZ(0)), qml.probs(wires=[0, 1])
x = tf.Variable(0.5, dtype=tf.float64)
with tf.GradientTape() as tape:
res = circuit(x)
where:
>>> print(res)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.8775825618903726>, <tf.Tensor: shape=(4,), dtype=float64, numpy=array([0.93879128, 0. , 0.06120872, 0. ])>)
Unfortunately, TF does not let you call Jacobian on a tuple :(
Traceback (most recent call last):
File "test.py", line 54, in <module>
grad = tape.jacobian(res, x)
File "/home/josh/miniconda3/envs/37/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 1166, in jacobian
target_static_shape = target.shape
AttributeError: 'tuple' object has no attribute 'shape'
You can do grad = [tape.jacobian(r, x) for r in res]
, but it is slower since you lose vectorization.
If your cost function is scalar valued though, everything works well:
with tf.GradientTape() as tape:
expval, probs = circuit(x)
res = expval + tf.reduce_sum(probs)
>>>print(res)
tf.Tensor(1.8775825618903725, shape=(), dtype=float64)
>>> grad = tape.jacobian(res, x)
>>> print(grad)
tf.Tensor(-0.479425538604203, shape=(), dtype=float64)
Torch
I think this is fully possible, but can't get the backwards call working. Sounds like it would result in a much more complicate torch backward(grad_output)
method, since we can no longer assume that grad_output
is flat - it will now have to match the 'shape' of the forward pass output tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146, interesting investigation! Looks like returning a tuple is more expected, but harder to support with the different interfaces. In which case, I'm happy to keep it as is here and maybe we come back to think about this question.
Do we expect this use case to become common, e.g., users regularly requesting return expval, probs
? In practice, I've typically requested probs
in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning a tuple can be supported on all interfaces, it just requires a bit more work. But definitely doable.
I've typically requested probs in isolation.
agree.
There is one case I am worried about though:
return expval(..), expval(..)
Currently this returns a length-2 numpy array. Under the new suggestion above, this would return a tuple of numpy scalars. I think this is more intuitive from a UI perspective, but would be a very invasive change.
If the user wants an array as output, they could do something like?
return np.array([expval(...), expval(...)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one case I am worried about though
Good point, although I'd worry that making such a change may be annoying since probably most use cases involve treating the output as a numpy array even if it was technically specified as a tuple return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd worry that making such a change may be annoying
It might not be... indexing res[0]
will continue to work with a tuple, and NumPy functions such as np.sum(res)
also work with tuples. It might... be okay?
Definitely having the QNode always return a tuple will be better for long term flexibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146! 💯
Co-authored-by: Christina Lee <christina@xanadu.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very excited to have this functionality!
It passes tests, adds functionality, and has no obvious issues, so I approve.
Co-authored-by: Christina Lee <christina@xanadu.ai>
Context:
The Torch custom gradient interface currently returns a non-differentiable gradient.
Description of the Change:
Modifies the Torch interface so that the Jacobian also defines a custom gradient --- the vector-Hessian product --- where the Hessian is computed by an efficient implementation of the parameter-shift rule.
Torch QNodes can now be doubly differentiated, on both simulators and hardware:
In addition, a slight tweak was made to the Jacobian/Hessian evaluation logic in order to avoid redundant Jacobian/Hessian computations.
Benefits:
Torch QNodes now support double derivatives on hardware and software
The Jacobian/Hessian is now only computed once for given input parameters, and re-used for multiple VJPs/VHPs.
Possible Drawbacks:
3rd derivatives and higher are not supported. We could potentially support higher derivatives using recursion.
Currently, Jacobian parameter-shifts are not being re-used for the Hessian parameter-shifts. This requires more thinking.
I realised while writing this PR that the parameter-shift Hessian logic is based on the formula given here: https://arxiv.org/abs/2008.06517. However, this formula assumes all gates support the 2-term parameter-shift; this is not the case of the controlled rotation. Therefore, computing the Hessian of the controlled rotations will give the incorrect result!
Long term, we should consider implementing parameter-shift logic as follows:
vjp
andvhp
directly. This avoids redundant parameter-shift evaluations.Related GitHub Issues: n/a