Add support for the parameter-shift hessian to the Torch interface #1129

josh146 · 2021-03-09T11:21:37Z

Context:

The Torch custom gradient interface currently returns a non-differentiable gradient.

Description of the Change:

Modifies the Torch interface so that the Jacobian also defines a custom gradient --- the vector-Hessian product --- where the Hessian is computed by an efficient implementation of the parameter-shift rule.

Torch QNodes can now be doubly differentiated, on both simulators and hardware:

import pennylane as qml
from pennylane import numpy as np
import torch

dev = qml.device('default.qubit', wires=1)

@qml.qnode(dev, interface='torch', diff_method="parameter-shift")
def circuit(p):
    qml.RY(p[0], wires=0)
    qml.RX(p[1], wires=0)
    return qml.probs(0)

x = torch.tensor([1.0, 2.0], requires_grad=True)
jac_fn = lambda x: torch.autograd.functional.jacobian(circuit, x, create_graph=True)

>>> circuit(x)
tensor([0.3876, 0.6124], dtype=torch.float64, grad_fn=<SqueezeBackward0>)
>>> jac_fn(x)
tensor([[ 0.1751, -0.2456],
        [-0.1751,  0.2456]], grad_fn=<ViewBackward>)
>>> torch.autograd.functional.jacobian(jac_fn, x)
tensor([[[ 0.1124,  0.3826],
         [ 0.3826,  0.1124]],
        [[-0.1124, -0.3826],
         [-0.3826, -0.1124]]])

In addition, a slight tweak was made to the Jacobian/Hessian evaluation logic in order to avoid redundant Jacobian/Hessian computations.

Benefits:

Torch QNodes now support double derivatives on hardware and software
The Jacobian/Hessian is now only computed once for given input parameters, and re-used for multiple VJPs/VHPs.

Possible Drawbacks:

3rd derivatives and higher are not supported. We could potentially support higher derivatives using recursion.
Currently, Jacobian parameter-shifts are not being re-used for the Hessian parameter-shifts. This requires more thinking.
I realised while writing this PR that the parameter-shift Hessian logic is based on the formula given here: https://arxiv.org/abs/2008.06517. However, this formula assumes all gates support the 2-term parameter-shift; this is not the case of the controlled rotation. Therefore, computing the Hessian of the controlled rotations will give the incorrect result!
Long term, we should consider implementing parameter-shift logic as follows:
- Low level, we provide functions for computing vjp and vhp directly. This avoids redundant parameter-shift evaluations.
- Parameter-shifts for the same value are cached and re-used
- Recursion for arbitrary order derivatives.

Related GitHub Issues: n/a

github-actions · 2021-03-09T11:21:57Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

josh146 · 2021-03-09T12:08:49Z

pennylane/tape/tapes/jacobian_tape.py

+                if self.output_dim is not None:
+                    hessian = np.zeros(
+                        (len(params), len(params), np.prod(self.output_dim)), dtype=float
+                    )
+                else:
+                    hessian = np.zeros((len(params), len(params)), dtype=float)


A small modification to allow Hessians to be computed for vector-valued QNodes

codecov · 2021-03-09T12:41:42Z

Codecov Report

Merging #1129 (388a0be) into master (2069021) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1129   +/-   ##
=======================================
  Coverage   98.11%   98.12%           
=======================================
  Files         144      144           
  Lines       10794    10813   +19     
=======================================
+ Hits        10591    10610   +19     
  Misses        203      203

Impacted Files	Coverage Δ
pennylane/interfaces/torch.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2069021...388a0be. Read the comment docs.

… hessian_torch

trbromley

Thanks @josh146! I have just left some questions mainly for my understanding, once clarified I'll be happy to approve.

trbromley · 2021-03-19T12:19:58Z

.github/CHANGELOG.md

+  >>> jacobian(circuit, x)
+  tensor([[ 0.1751, -0.2456],
+          [-0.1751,  0.2456]], grad_fn=<ViewBackward>)
+  >>> hessian(circuit, x)


This is really cool!
So when it's the hessian of a vector output, are we just lining up the individual Hessian matrices? E.g. like here. I guess this is pretty standard.

Yep! With a disclaimer that I'm not entirely sure how pytorch is ordering the dimensions here

The output below agrees with TF and Autograd, but should double check this

trbromley · 2021-03-19T12:21:52Z

.github/CHANGELOG.md

@@ -2,6 +2,39 @@

 <h3>New features since last release</h3>

+* Computing second derivatives and Hessians of QNodes is now supported when
+  using the PyTorch interface.


So for my understanding, we'd also like to add for other interfaces, but Torch is the first one we're doing (maybe because it's easier)?

So this is a continuation of #961?

We've got #1131 (autograd) and #1110 (tf) also open :) Just decided to split it into three to help with code review. But feel free to review the others as well if interested!

trbromley · 2021-03-19T12:26:16Z

pennylane/tape/interfaces/torch.py

+            * Takes advantage of closure, to cache computed gradient matrices via
+              the ctx.saved_grad_matrices attribute, to avoid gradient matrices being
+              computed multiple redundant times.


trbromley · 2021-03-19T12:29:33Z

pennylane/tape/interfaces/torch.py

+              and *not* the full GradMatrix, differentiating vector-valued
+              functions will result in multiple backward passes.
+            """
+            if grad_matrix_fn in ctx.saved_grad_matrices:


Why do we make saved_grad_matrices an attribute of ctx rather than standalone? Is ctx.saved_grad_matrices used somewhere behind the scenes?

I tried this, for some reason without using the context PyTorch doesn't 'see' the saved_grad_matrices closure variable 🤔 I chalked it up to PyTorch doing something funky with the class behind the scenes

trbromley · 2021-03-19T12:34:48Z

pennylane/tape/interfaces/torch.py

+                return jacobian
+
+            @staticmethod
+            def backward(ctx_, ddy):  # pragma: no cover


Why do we turn off coverage?

Even though we have gradient tests for the torch interface, for some reason pycoverage can't see that the backward() method is called 🤔 So it will show up as not covered otherwise.

pennylane/tape/interfaces/torch.py

trbromley · 2021-03-19T12:59:45Z

tests/tape/interfaces/test_qnode_torch.py

@@ -527,6 +528,149 @@ def circuit():
        assert isinstance(res[0], torch.Tensor)
        assert isinstance(res[1], torch.Tensor)

+    def test_hessian(self, dev_name, diff_method, mocker, tol):
+        """Test hessian calculation of a scalar valued QNode"""
+        if diff_method not in {"parameter-shift", "backprop"}:


So if diff_method="backprop" it still calculates the hessian using param-shift? Or it just uses a backprop hessian?

The torch interface doesn't yet support backprop, but this should be possible soon (since Torch1.8 supports differentiating complex matrix multiplication, see #929).

Note that this class is not parametrized over diff_method="backprop", so that case never occurs here. I left in the if statement just as a reminder that backprop should also be tested once #929 is merged.

Do you think we should delete this if statement for now?

Ok, so in this test we'll currently not see diff_method="backprop", but hopefully will soon? Kind of feels like the case where we should set up a separate test fail using "backprop" and then mark that as a strict xfail.

Can we set the xfail flag inside an if statement?

trbromley · 2021-03-19T13:02:40Z

tests/tape/interfaces/test_qnode_torch.py

+        expected_hess = [
+            [
+                [-np.cos(a) * np.cos(b), np.sin(a) * np.sin(b)],
+                [np.sin(a) * np.sin(b), -np.cos(a) * np.cos(b)]
+            ],
+            [
+                [-0.5 * np.cos(a) * np.cos(b), 0.5 * np.sin(a) * np.sin(b)],
+                [0.5 * np.sin(a) * np.sin(b), -0.5 * np.cos(a) * np.cos(b)]
+            ],
+            [
+                [0.5 * np.cos(a) * np.cos(b), -0.5 * np.sin(a) * np.sin(b)],
+                [-0.5 * np.sin(a) * np.sin(b), 0.5 * np.cos(a) * np.cos(b)]
+            ]
+        ]


So the qnode is ragged, but the Hessian is a more standard (3, 2, 2) shape? Is this what PyTorch would expect?
E.g., not something like [hessian1, hessian2] with hessian1 of shape (2, 2) and hessian2 of shape (2, 2, 2)?

I don't mind either way, just wondering.

This is a very good point.

Currently, if the QNode returns 'ragged' output (e.g., expval + probs, or multiple probs on different wires), we are flattening the output due to a very ancient restriction we had that QNodes always return NumPy arrays. This restriction was made back when we only had an Autograd interface to support, and no probs.

However, it's probably time we updated this:

If the QNode return statement returns a tuple, the user should get a tuple as output, not a NumPy array.

This avoids the whole issue of ragged arrays altogether

This would actually be a cool thing to sort out now-ish, before usage of return expval, probs etc. becomes entrenched (at the moment, it's quite rare)

I got curious, and coded this up.

TensorFlow

import pennylane as qml import tensorflow as tf dev = qml.device("default.qubit", wires=2) @qml.qnode(dev, interface="tf") def circuit(x): qml.RX(x, wires=0) return qml.expval(qml.PauliZ(0)), qml.probs(wires=[0, 1]) x = tf.Variable(0.5, dtype=tf.float64) with tf.GradientTape() as tape: res = circuit(x)

where:

>>> print(res) (<tf.Tensor: shape=(), dtype=float64, numpy=0.8775825618903726>, <tf.Tensor: shape=(4,), dtype=float64, numpy=array([0.93879128, 0. , 0.06120872, 0. ])>)

Unfortunately, TF does not let you call Jacobian on a tuple :(

Traceback (most recent call last): File "test.py", line 54, in <module> grad = tape.jacobian(res, x) File "/home/josh/miniconda3/envs/37/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 1166, in jacobian target_static_shape = target.shape AttributeError: 'tuple' object has no attribute 'shape'

You can do grad = [tape.jacobian(r, x) for r in res], but it is slower since you lose vectorization.

If your cost function is scalar valued though, everything works well:

with tf.GradientTape() as tape: expval, probs = circuit(x) res = expval + tf.reduce_sum(probs) >>>print(res) tf.Tensor(1.8775825618903725, shape=(), dtype=float64) >>> grad = tape.jacobian(res, x) >>> print(grad) tf.Tensor(-0.479425538604203, shape=(), dtype=float64)

Torch

I think this is fully possible, but can't get the backwards call working. Sounds like it would result in a much more complicate torch backward(grad_output) method, since we can no longer assume that grad_output is flat - it will now have to match the 'shape' of the forward pass output tuple.

Thanks @josh146, interesting investigation! Looks like returning a tuple is more expected, but harder to support with the different interfaces. In which case, I'm happy to keep it as is here and maybe we come back to think about this question.

Do we expect this use case to become common, e.g., users regularly requesting return expval, probs? In practice, I've typically requested probs in isolation.

I think returning a tuple can be supported on all interfaces, it just requires a bit more work. But definitely doable.

I've typically requested probs in isolation.

agree.

There is one case I am worried about though:

return expval(..), expval(..)

Currently this returns a length-2 numpy array. Under the new suggestion above, this would return a tuple of numpy scalars. I think this is more intuitive from a UI perspective, but would be a very invasive change.

If the user wants an array as output, they could do something like?

return np.array([expval(...), expval(...)])

There is one case I am worried about though

Good point, although I'd worry that making such a change may be annoying since probably most use cases involve treating the output as a numpy array even if it was technically specified as a tuple return.

I'd worry that making such a change may be annoying

It might not be... indexing res[0] will continue to work with a tuple, and NumPy functions such as np.sum(res) also work with tuples. It might... be okay?

Definitely having the QNode always return a tuple will be better for long term flexibility

trbromley

Thanks @josh146! 💯

pennylane/interfaces/torch.py

Co-authored-by: Christina Lee <christina@xanadu.ai>

pennylane/tape/jacobian_tape.py

albi3ro

Very excited to have this functionality!

It passes tests, adds functionality, and has no obvious issues, so I approve.

Co-authored-by: Christina Lee <christina@xanadu.ai>

Add support for the parameter-shift hessian to the Torch interface

cf8e879

josh146 added 2 commits March 9, 2021 19:25

update changelog

0f9dec6

fixes

26f509a

josh146 commented Mar 9, 2021

View reviewed changes

josh146 added 2 commits March 9, 2021 20:13

fixes

de830d1

Merge branch 'master' into hessian_torch

5be38ce

josh146 mentioned this pull request Mar 9, 2021

Incorrect value of gradients at 0 after taking gradient twice #1125

Closed

josh146 added 3 commits March 9, 2021 21:58

fix

2eae4e8

Merge branch 'hessian_torch' of github.com:PennyLaneAI/pennylane into…

7782493

… hessian_torch

line break

01c2145

josh146 requested review from trbromley and albi3ro March 9, 2021 17:09

trbromley reviewed Mar 19, 2021

View reviewed changes

merge master

c6efa64

trbromley approved these changes Mar 22, 2021

View reviewed changes

josh146 added 2 commits March 24, 2021 13:21

Merge branch 'master' into hessian_torch

71e97b8

Merge branch 'master' into hessian_torch

7ac39dc

albi3ro reviewed Mar 25, 2021

View reviewed changes

pennylane/interfaces/torch.py Show resolved Hide resolved

albi3ro reviewed Mar 25, 2021

View reviewed changes

pennylane/interfaces/torch.py Show resolved Hide resolved

josh146 and others added 2 commits March 26, 2021 00:07

Update pennylane/interfaces/torch.py

e9ee54b

Co-authored-by: Christina Lee <christina@xanadu.ai>

Merge branch 'master' into hessian_torch

12cf030

josh146 mentioned this pull request Mar 25, 2021

Add support for the parameter-shift hessian to the Autograd interface #1131

Merged

albi3ro reviewed Mar 25, 2021

View reviewed changes

pennylane/tape/jacobian_tape.py Outdated Show resolved Hide resolved

albi3ro approved these changes Mar 25, 2021

View reviewed changes

albi3ro and others added 4 commits March 25, 2021 13:07

Merge branch 'master' into hessian_torch

2053bee

Merge branch 'master' into hessian_torch

9bedc70

Update pennylane/tape/jacobian_tape.py

39e9e97

Co-authored-by: Christina Lee <christina@xanadu.ai>

Merge branch 'master' into hessian_torch

388a0be

josh146 merged commit ede35f4 into master Mar 26, 2021

josh146 deleted the hessian_torch branch March 26, 2021 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the parameter-shift hessian to the Torch interface #1129

Add support for the parameter-shift hessian to the Torch interface #1129

josh146 commented Mar 9, 2021 •

edited

Loading

github-actions bot commented Mar 9, 2021

josh146 Mar 9, 2021

codecov bot commented Mar 9, 2021 •

edited

Loading

trbromley left a comment

trbromley Mar 19, 2021

josh146 Mar 19, 2021

josh146 Mar 19, 2021

trbromley Mar 19, 2021

josh146 Mar 19, 2021

trbromley Mar 19, 2021

trbromley Mar 19, 2021

josh146 Mar 19, 2021

trbromley Mar 19, 2021

josh146 Mar 19, 2021

trbromley Mar 19, 2021

josh146 Mar 19, 2021

trbromley Mar 22, 2021

josh146 Mar 22, 2021

trbromley Mar 19, 2021

josh146 Mar 19, 2021

josh146 Mar 19, 2021

josh146 Mar 19, 2021 •

edited

Loading

trbromley Mar 22, 2021

josh146 Mar 22, 2021 •

edited

Loading

trbromley Mar 22, 2021

josh146 Mar 22, 2021

trbromley left a comment

albi3ro left a comment

Add support for the parameter-shift hessian to the Torch interface #1129

Add support for the parameter-shift hessian to the Torch interface #1129

Conversation

josh146 commented Mar 9, 2021 • edited Loading

github-actions bot commented Mar 9, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 9, 2021 • edited Loading

Codecov Report

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 Mar 19, 2021 • edited Loading

Choose a reason for hiding this comment

TensorFlow

Torch

Choose a reason for hiding this comment

josh146 Mar 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

albi3ro left a comment

Choose a reason for hiding this comment

josh146 commented Mar 9, 2021 •

edited

Loading

codecov bot commented Mar 9, 2021 •

edited

Loading

josh146 Mar 19, 2021 •

edited

Loading

josh146 Mar 22, 2021 •

edited

Loading