-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiable finite-difference gradient transform #1476
Conversation
Hello. You may have forgotten to update the changelog!
|
Codecov Report
@@ Coverage Diff @@
## master #1476 +/- ##
==========================================
+ Coverage 98.30% 98.31% +0.01%
==========================================
Files 170 172 +2
Lines 12483 12579 +96
==========================================
+ Hits 12271 12367 +96
Misses 212 212
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very slick transform! 😎
Mostly minor points within, then two more general suggestions:
- How user-facing is this transform? Is it something that would also be nice to have a QNode transform wrapper for so users don't have to work directly with tapes?
- While there is a parameter
n
to set the order of the gradient, I mostly noticedn=1
and a fewn=2
cases in the tests. It may be valuable to show a larger example; also, the transform can be applied to something multiple times, right? If so would be cool to see an example of that as well.
""" | ||
# pylint: disable=protected-access,too-many-arguments | ||
import numpy as np | ||
from scipy.special import factorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I didn't know about this function 😁 (or that it would have applications here1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 actually terrifying now that I realize there is a factorial scaling in here somewhere lol
|
||
def test_autograd(self, order, form, tol): | ||
"""Tests that the output of the finite-difference transform | ||
can be differentiated using autograd, yielding second derivatives.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about where the second derivative comes in since n=1
in the transform. Is it because the base tape is a JacobianTape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it's because the device is default.qubit.autograd
!
- I am writing a cost function that computes the finite-difference gradient of this QNode using this transform.
- Since this transform is differentiable, I can differentiate the cost function
- When the autodiff framework performs backpropagation, it will end up needing to compute various
d tape/d parameters
as part of the chain rule. However, since the device is written using autograd, more chain rules are applied to simply get tod state/d parameters
!
So it's a mixture of (a) the transform being differentiable, and (b) the device execution being differentiable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, you can differentiate any qfunc(qnode)
cost function, to get the gradient d qfunc(qnode)/d parameters
. In this case, since qfunc
itself is the first derivative, we end up with the second derivative 😆
Co-authored-by: Olivia Di Matteo <2068515+glassnotes@users.noreply.github.com>
At some point I would like to make it more user-friendly, but at the moment it is purely a port of
I have to admit, I planned to only port the code in However.... I went a bit down a rabbit hole while porting it, and instead implemented support for arbitrary finite-diff rules 🙁 |
@mariaschuld these were good catches. I've updated the PR accordingly, hopefully it should be easier to follow now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh146 updated version looks great, new parameter names and examples are really helpful!
... qml.RX(params[2], wires=0) | ||
... qml.expval(qml.PauliZ(0)) | ||
... qml.var(qml.PauliZ(0)) | ||
>>> tape.trainable_params = {0, 1, 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this, why is it { }
instead of [ ]
for trainable_params
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - trainable_params
is a set! Since trainable parameters must be unique, and order doesn't matter
Co-authored-by: Olivia Di Matteo <2068515+glassnotes@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to me! 👍
def finite_diff_coeffs(n, approx, strategy): | ||
r"""Generate the finite difference shift values and corresponding | ||
term coefficients for a given derivative order, approximation accuracy, | ||
and strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also voluntary, but you may make future developer's life sweet and easy if you mention the equation for the coefficients here - at least to me this piece of theory is not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | ||
# TODO: replace the JacobianTape._grad_method_validation | ||
# functionality before deprecation. | ||
diff_methods = tape._grad_method_validation("numeric") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to learn about this - would it return a list like ["F", "F",...]? Isn't at this stage clear that all parameters should do finite-diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method also does two other things:
- Returns
"0"
if the output is independent of the parameter - Raises an error if the parameter has
grad_method=None
.
This is purely to retain backwards compatibility with JacobianTape.jacobian
. My main goal of this PR was that,
>>> tapes, fn = finite_diff(tape)
>>> j = fn(dev.batch_execute(tapes))
should behave exactly like
>>> j = tape.jacobian(dev)
does in master 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, the code now explains itself!
The approval is of course conditioned on fixing CodeFactor issues and testing those two new untested lines, but I trust you :)
Co-authored-by: Maria Schuld <mariaschuld@gmail.com>
Context: As part of the roadmap for supporting differentiable batch execution, gradient logic will be moved out of the subclasses and into a module of pure functions. This is the first such PR; here, we move the finite-difference logic out of
JacobianTape
and into a newgradients
package.Description of the Change:
Adds a function
qml.gradients.finite_diff_stencil
, for generating the coefficients and shifts required for finite-difference rules of various order, accuracy, and form.Adds a function
qml.gradients.finite_diff
for generating the finite-difference tapes of an input tape. The transform is fully differentiable, and may be differentiated to get higher-order derivatives.Benefits: The finite-difference logic is now much more user and dev accessible, and supports higher order derivatives.
Possible Drawbacks:
Currently, when executing a device, the output is always cast to a tensor. For example,
expval(), expval() -> tensor[2]
, andprobs[0], probs[1] -> tensor[2, 2]
. However, this results in ragged arrays in cases such asexpval(), probs()
.This has two drawbacks:
In the current codebase, we get around the latter issue by always
hstack
-ing the ragged arrays when we compute the Jacobian. E.g., for N input parameters and outputexpval(), probs([0, 1]) -> tensor[[1], [4]]
, the Jacobian will be returned as a[5, N]
array (the 1 and 4 dimensions of the output have been concatenated).However, long term, we should move to treating the output as a tuple
expval(), probs([0, 1]) -> tuple[tensor[1], tensor[4]]
. This avoids the issues listed in the bullet points above, and also makes the jacobian computation easier; we simply returntuple[tensor[1, N], tensor[4, N]]
.Related GitHub Issues: n/a