Incorrect value of gradients at 0 after taking gradient twice #1125

glassnotes · 2021-03-08T14:59:40Z

Issue description

A colleague noticed that when running qml.grad twice on a particular circuit, to compute a second derivative, the output value of the gradient is incorrect when the input value is precisely 0. The specific circuit is effectively the function cos(x) (see below for the code to reproduce).

Expected behavior: The value of the second gradient at 0 for this function should be -1.
Actual behavior: The value of the second gradient at 0 is computed to be -0.5.
Reproduces how often: Always
System information: (post the output of import pennylane as qml; qml.about())

Name: PennyLane
Version: 0.15.0.dev0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/XanaduAI/pennylane
Author: None
Author-email: None
License: Apache License 2.0
Location: /home/olivia/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages
Requires: appdirs, networkx, scipy, autograd, toml, semantic-version, numpy
Required-by: PennyLane-SF, pennylane-qulacs, PennyLane-qsharp, PennyLane-qiskit, PennyLane-Qchem, PennyLane-Forest, PennyLane-Cirq
Platform info:           Linux-5.4.0-66-generic-x86_64-with-glibc2.10
Python version:          3.8.5
Numpy version:           1.18.5
Scipy version:           1.4.1
Installed devices:
- default.gaussian (PennyLane-0.15.0.dev0)
- default.mixed (PennyLane-0.15.0.dev0)
- default.qubit (PennyLane-0.15.0.dev0)
- default.qubit.autograd (PennyLane-0.15.0.dev0)
- default.qubit.jax (PennyLane-0.15.0.dev0)
- default.qubit.tf (PennyLane-0.15.0.dev0)
- default.tensor (PennyLane-0.15.0.dev0)
- default.tensor.tf (PennyLane-0.15.0.dev0)
- strawberryfields.fock (PennyLane-SF-0.13.0.dev0)
- strawberryfields.gaussian (PennyLane-SF-0.13.0.dev0)
- strawberryfields.gbs (PennyLane-SF-0.13.0.dev0)
- strawberryfields.remote (PennyLane-SF-0.13.0.dev0)
- strawberryfields.tf (PennyLane-SF-0.13.0.dev0)
- qulacs.simulator (pennylane-qulacs-0.12.0)
- microsoft.QuantumSimulator (PennyLane-qsharp-0.8.0)
- qiskit.aer (PennyLane-qiskit-0.14.0.dev0)
- qiskit.basicaer (PennyLane-qiskit-0.14.0.dev0)
- qiskit.ibmq (PennyLane-qiskit-0.14.0.dev0)
- forest.numpy_wavefunction (PennyLane-Forest-0.13.0.dev0)
- forest.qpu (PennyLane-Forest-0.13.0.dev0)
- forest.qvm (PennyLane-Forest-0.13.0.dev0)
- forest.wavefunction (PennyLane-Forest-0.13.0.dev0)
- cirq.mixedsimulator (PennyLane-Cirq-0.13.0)
- cirq.pasqal (PennyLane-Cirq-0.13.0)
- cirq.qsim (PennyLane-Cirq-0.13.0)
- cirq.qsimh (PennyLane-Cirq-0.13.0)
- cirq.simulator (PennyLane-Cirq-0.13.0)

Source code and tracebacks

Code to reproduce:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=1)

# Produces function cos(x)
@qml.qnode(dev)
def circuit(x):
    qml.RY(x, wires=0)
    return qml.expval(qml.PauliZ(0))

gradcos  = qml.grad(circuit)  # -sin(x)
gradcos2 = qml.grad(gradcos)  # -cos(x)

print(gradcos2(0.)) # Should be -1, but outputs -0.5

The issue seems to be at 0 specifically, as evaluating the gradient at 1e-6 yields a result very close to -1.

I tried another example as well:

# Produces function -i*sin(x)
@qml.qnode(dev)
def circuit(x):
    qml.PauliX(wires=0)
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

gradcos  = qml.grad(circuit)  # -i * cos(x)
gradcos2 = qml.grad(gradcos)  # i * sin(x)

print(gradcos2(0.)) # Should be 1, but outputs 0.5

The text was updated successfully, but these errors were encountered:

josh146 · 2021-03-09T07:16:56Z

🤔 This is very strange, since the above code should be using the autograd/backprop pipeline. I wonder if this is a bug in Autograd?

@glassnotes if you set diff_method="parameter-shift", do you get the correct gradient?

mariaschuld · 2021-03-09T08:15:59Z

I remember reading discussions on pytorch and tensorflow about these singularity issues before. I think the verdict was just that one needs to know that this can happen, because numerical stability cannot be guaranteed...

glassnotes · 2021-03-09T12:41:45Z

Trying any other differentiation method gives me TypeError: must be real number, not ArrayBox.

Here's the full traceback, if that's helpful:

------------------------
TypeErrorTraceback (most recent call last)
<ipython-input-5-7c53faf1e5f7> in <module>
     14 gradcos2 = qml.grad(gradcos)  # -cos(x)
     15 
---> 16 print(gradcos2(1e-6)) # Should be -1, but outputs -0.5

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_grad.py in __call__(self, *args, **kwargs)
     97         """Evaluates the gradient function, and saves the function value
     98         calculated during the forward pass in :attr:`.forward`."""
---> 99         grad_value, ans = self._get_grad_fn(args)(*args, **kwargs)
    100         self._forward = ans
    101         return grad_value

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/wrap_util.py in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_grad.py in _grad_with_forward(fun, x)
    114         difference being that it returns both the gradient *and* the forward pass
    115         value."""
--> 116         vjp, ans = _make_vjp(fun, x)
    117 
    118         if not vspace(ans).size == 1:

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/core.py in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/tracer.py in trace(start_node, fun, x)
      8     with trace_stack.new_trace() as t:
      9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
     11         if isbox(end_box) and end_box._trace == start_box._trace:
     12             return end_box._value, end_box._node

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/wrap_util.py in unary_f(x)
     13                 else:
     14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
     16             if isinstance(argnum, int):
     17                 x = args[argnum]

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_grad.py in __call__(self, *args, **kwargs)
     97         """Evaluates the gradient function, and saves the function value
     98         calculated during the forward pass in :attr:`.forward`."""
---> 99         grad_value, ans = self._get_grad_fn(args)(*args, **kwargs)
    100         self._forward = ans
    101         return grad_value

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/wrap_util.py in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_grad.py in _grad_with_forward(fun, x)
    122             )
    123 
--> 124         grad_value = vjp(vspace(ans).ones())
    125         return grad_value, ans
    126 

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/core.py in vjp(g)
     12         def vjp(g): return vspace(x).zeros()
     13     else:
---> 14         def vjp(g): return backward_pass(g, end_node)
     15     return vjp, end_value
     16 

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/core.py in backward_pass(g, end_node)
     19     for node in toposort(end_node):
     20         outgrad = outgrads.pop(node)
---> 21         ingrads = node.vjp(outgrad[0])
     22         for parent, ingrad in zip(node.parents, ingrads):
     23             outgrads[parent] = add_outgrads(outgrads.get(parent), ingrad)

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/autograd/core.py in <lambda>(g)
     65                     "VJP of {} wrt argnum 0 not defined".format(fun.__name__))
     66             vjp = vjpfun(ans, *args, **kwargs)
---> 67             return lambda g: (vjp(g),)
     68         elif L == 2:
     69             argnum_0, argnum_1 = argnums

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/tape/interfaces/autograd.py in gradient_product(g)
    200             # pass, so we do not need to re-unwrap the parameters.
    201             self.set_parameters(self._all_params_unwrapped, trainable_only=False)
--> 202             jac = self.jacobian(device, params=params, **self.jacobian_options)
    203             self.set_parameters(self._all_parameter_values, trainable_only=False)
    204 

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/tape/tapes/qubit_param_shift.py in jacobian(self, device, params, **options)
    122         self._append_evA_tape = True
    123         self._evA_result = None
--> 124         return super().jacobian(device, params, **options)
    125 
    126     def parameter_shift(self, idx, params, **options):

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/tape/tapes/jacobian_tape.py in jacobian(self, device, params, **options)
    563 
    564         # execute all tapes at once
--> 565         results = device.batch_execute(all_tapes)
    566 
    567         # post-process the results with the appropriate function to fill jacobian columns with gradients

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_qubit_device.py in batch_execute(self, circuits)
    279             self.reset()
    280 
--> 281             res = self.execute(circuit)
    282             results.append(res)
    283 

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/_qubit_device.py in execute(self, circuit, **kwargs)
    203 
    204         # apply all circuit operations
--> 205         self.apply(circuit.operations, rotations=circuit.diagonalizing_gates, **kwargs)
    206 
    207         # generate computational basis samples

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/devices/default_qubit.py in apply(self, operations, rotations, **kwargs)
    189                 self._apply_basis_state(operation.parameters[0], operation.wires)
    190             else:
--> 191                 self._state = self._apply_operation(self._state, operation)
    192 
    193         # store the pre-rotated state

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/devices/default_qubit.py in _apply_operation(self, state, operation)
    214             return self._apply_ops[operation.base_name](state, axes, inverse=operation.inverse)
    215 
--> 216         matrix = self._get_unitary_matrix(operation)
    217 
    218         if isinstance(operation, DiagonalOperation):

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/devices/default_qubit.py in _get_unitary_matrix(self, unitary)
    404             return unitary.eigvals
    405 
--> 406         return unitary.matrix
    407 
    408     @classmethod

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/operation.py in matrix(self)
    742     @property
    743     def matrix(self):
--> 744         op_matrix = self._matrix(*self.parameters)
    745 
    746         if self.inverse:

~/Software/anaconda3/envs/xanadu/lib/python3.8/site-packages/pennylane/ops/qubit.py in _matrix(cls, *params)
    603     def _matrix(cls, *params):
    604         theta = params[0]
--> 605         c = math.cos(theta / 2)
    606         js = 1j * math.sin(-theta / 2)
    607 

TypeError: must be real number, not ArrayBox

josh146 · 2021-03-09T13:47:20Z

Oh, that is because parameter-shift does not yet support the hessian 😆 Just working on that now.

Note that TensorFlow outputs the same value, also using backprop:

import pennylane as qml
import tensorflow as tf

dev = qml.device("default.qubit", wires=1)

# Produces function cos(x)
@qml.qnode(dev, interface="tf")
def circuit(x):
    qml.RY(x, wires=0)
    return qml.expval(qml.PauliZ(0))

x = tf.Variable(0., dtype=tf.float64)

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        res = circuit(x)
    grad = tape2.gradient(res, x)
grad2 = tape1.gradient(grad, x)

print(grad2) # Should be -1, but outputs -0.5

josh146 · 2021-03-09T13:51:20Z

Running the following using #1129,

import pennylane as qml
import torch

dev = qml.device("default.qubit", wires=1)

# Produces function cos(x)
@qml.qnode(dev, interface="torch", diff_method="parameter-shift")
def circuit(x):
    qml.RY(x, wires=0)
    return qml.expval(qml.PauliZ(0))

x = torch.tensor(0., requires_grad=True)
grad2 = torch.autograd.functional.hessian(circuit, x)
print(grad2) # outputs -1

it works.

I think that one of the gate operations we are using must have a singularity/be ill-conditioned for x=0 when using backpropagation.

antalszava · 2021-03-13T02:05:34Z

@josh146 noted to explore the computation of the expectation value more and managed to find an odd behaviour when using np.abs:

import pennylane as qml
from pennylane import numpy as np

def RY(x):
    return np.array([[np.cos(x/2), -np.sin(x/2)],[np.sin(x/2), np.cos(x/2)]])

for func in [np.abs, lambda x: x]:
    def circuit(x):
        state = np.array([1,0])
        rot = RY(*x) @ state

        eigs = np.array([1, -1])
        probs = func(rot) ** 2
        return np.dot(eigs, probs)

    gradcos2 = qml.grad(qml.grad(circuit))  # -cos(x)

    print(gradcos2(np.array([0.], requires_grad=True))) # Should be -1, but outputs when using a QNode -0.5

[-0.5]
[-1.]

For x!=0, we are getting matching values as reported when using a QNode.

glassnotes · 2021-03-15T15:24:05Z

Is this something that will be fixed through #1129 then or a more global issue with numpy? This seems very strange 😕

josh146 · 2021-03-15T15:38:03Z

@glassnotes this will be fixed in #1129 insomuch as that the following code will run with diff_method="parameter-shift". However, it will not fix it for diff_method="backprop" (the default) since the problem is that Autograd does not properly differentiate np.abs(0) at x=0 🤔

We have two potential solutions:

Remove usage of np.abs() from default.qubit --- this will likely be quite difficult though
'Monkeypatch' np.abs() so that it provides the correct derivative at x=0 inside of the pennylane.numpy subpackage

co9olguy · 2021-03-15T16:11:19Z

Technically the "correct" derivative of np.abs() at x=0 is "undefined" 😉

dwierichs · 2021-12-22T01:15:09Z

I looked at this a bit, and a fix would be adding the _real function to the qubit devices analogously to _imag, which already is there. Then, we may replace L792 in pennylane/devices/default_qubit.py,

    prob = self.marginal_prob(self._abs(self._flatten(self._state)) ** 2, wires)

by

    prob = self.marginal_prob(self._flatten(self._real(self._state) ** 2 + self._imag(self._state) ** 2), wires)

This does fix the issue in all frameworks.

However, the performance discussion is properly complex (pun not intended):
Even for standard NumPy, the performance comparison between onp.abs(x)**2 and onp.real(x)**2+onp.imag(x)**2 depends on the system architecture
There currently is a discussion going on whether and where a function abs2(x)=abs(x)**2 should be included in the array API standard (the above link, #13179 on NumPy and also #153 on array API) I assume we don't want to wait for such a new function to propagate through Autoray, TF, Torch and JAX and implement an own version in Autograd...

A simple circuit evaluation on 2-10 qubits seemed to show a small deprecation in performance with the proposed change above and variants thereof (e.g. first flattening, storing the result and extracting the real and imaginary part from there, leading to flat arrays being squared.) For larger systems and Autograd, the above seems to be a bit faster.

josh146 · 2021-12-22T06:32:39Z

Thanks for the deep dive on this @dwierichs!

A simple circuit evaluation on 2-10 qubits seemed to show a small deprecation in performance with the proposed change above and variants thereof (e.g. first flattening, storing the result and extracting the real and imaginary part from there, leading to flat arrays being squared.)

How much of a performance degradation do you notice?

For larger systems and Autograd, the above seems to be a bit faster.

This somewhat makes me lean towards making this change, as the overhead does not appear to be wrt qubit number

dwierichs · 2021-12-22T10:25:03Z

How much of a performance degradation do you notice?

It seems to not grow significantly with the system size and is on the order of tens of milliseconds. However it should be noted, that analytic_probability is used for all expectation values, so executed many many times in a full, say, optimization workflow.

It should be noted that this bug is an edge case, but I still think it would be good to fix it at the cost of the above small performance decrease.

glassnotes added the bug 🐛 Something isn't working label Mar 8, 2021

dwierichs mentioned this issue Dec 22, 2021

Remove abs from DefaultQubit #2057

Merged

dwierichs closed this as completed in #2057 Dec 22, 2021

glassnotes mentioned this issue Jun 4, 2024

Fix gpi diff 1 QSAR-UBC/ionizer#18

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect value of gradients at 0 after taking gradient twice #1125

Incorrect value of gradients at 0 after taking gradient twice #1125

glassnotes commented Mar 8, 2021

josh146 commented Mar 9, 2021

mariaschuld commented Mar 9, 2021

glassnotes commented Mar 9, 2021

josh146 commented Mar 9, 2021

josh146 commented Mar 9, 2021

antalszava commented Mar 13, 2021

glassnotes commented Mar 15, 2021

josh146 commented Mar 15, 2021 •

edited

Loading

co9olguy commented Mar 15, 2021

dwierichs commented Dec 22, 2021

josh146 commented Dec 22, 2021

dwierichs commented Dec 22, 2021

Incorrect value of gradients at 0 after taking gradient twice #1125

Incorrect value of gradients at 0 after taking gradient twice #1125

Comments

glassnotes commented Mar 8, 2021

Issue description

Source code and tracebacks

josh146 commented Mar 9, 2021

mariaschuld commented Mar 9, 2021

glassnotes commented Mar 9, 2021

josh146 commented Mar 9, 2021

josh146 commented Mar 9, 2021

antalszava commented Mar 13, 2021

glassnotes commented Mar 15, 2021

josh146 commented Mar 15, 2021 • edited Loading

co9olguy commented Mar 15, 2021

dwierichs commented Dec 22, 2021

josh146 commented Dec 22, 2021

dwierichs commented Dec 22, 2021

josh146 commented Mar 15, 2021 •

edited

Loading