Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiable finite-difference gradient transform #1476

Merged
merged 38 commits into from
Jul 29, 2021
Merged

Conversation

josh146
Copy link
Member

@josh146 josh146 commented Jul 26, 2021

Context: As part of the roadmap for supporting differentiable batch execution, gradient logic will be moved out of the subclasses and into a module of pure functions. This is the first such PR; here, we move the finite-difference logic out of JacobianTape and into a new gradients package.

Description of the Change:

  • Adds a function qml.gradients.finite_diff_stencil, for generating the coefficients and shifts required for finite-difference rules of various order, accuracy, and form.

  • Adds a function qml.gradients.finite_diff for generating the finite-difference tapes of an input tape. The transform is fully differentiable, and may be differentiated to get higher-order derivatives.

Benefits: The finite-difference logic is now much more user and dev accessible, and supports higher order derivatives.

Possible Drawbacks:

Currently, when executing a device, the output is always cast to a tensor. For example, expval(), expval() -> tensor[2], and probs[0], probs[1] -> tensor[2, 2]. However, this results in ragged arrays in cases such as expval(), probs().

This has two drawbacks:

  • NumPy is deprecating ragged arrays shortly (!!)
  • Some ML frameworks don't support ragged arrays

In the current codebase, we get around the latter issue by always hstack-ing the ragged arrays when we compute the Jacobian. E.g., for N input parameters and output expval(), probs([0, 1]) -> tensor[[1], [4]], the Jacobian will be returned as a [5, N] array (the 1 and 4 dimensions of the output have been concatenated).

However, long term, we should move to treating the output as a tuple expval(), probs([0, 1]) -> tuple[tensor[1], tensor[4]]. This avoids the issues listed in the bullet points above, and also makes the jacobian computation easier; we simply return tuple[tensor[1, N], tensor[4, N]].

Related GitHub Issues: n/a

@josh146 josh146 added the WIP 🚧 Work-in-progress label Jul 26, 2021
@github-actions
Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@josh146 josh146 changed the title [WIP] Differentiable finite-difference gradient transform Differentiable finite-difference gradient transform Jul 26, 2021
@josh146 josh146 added review-ready 👌 PRs which are ready for review by someone from the core team. and removed WIP 🚧 Work-in-progress labels Jul 26, 2021
@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #1476 (49fe608) into master (e8925e5) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1476      +/-   ##
==========================================
+ Coverage   98.30%   98.31%   +0.01%     
==========================================
  Files         170      172       +2     
  Lines       12483    12579      +96     
==========================================
+ Hits        12271    12367      +96     
  Misses        212      212              
Impacted Files Coverage Δ
pennylane/__init__.py 98.59% <100.00%> (+0.02%) ⬆️
pennylane/gradients/__init__.py 100.00% <100.00%> (ø)
pennylane/gradients/finite_difference.py 100.00% <100.00%> (ø)
pennylane/math/single_dispatch.py 99.43% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8925e5...49fe608. Read the comment docs.

Copy link
Contributor

@glassnotes glassnotes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very slick transform! 😎

Mostly minor points within, then two more general suggestions:

  • How user-facing is this transform? Is it something that would also be nice to have a QNode transform wrapper for so users don't have to work directly with tapes?
  • While there is a parameter n to set the order of the gradient, I mostly noticed n=1 and a few n=2 cases in the tests. It may be valuable to show a larger example; also, the transform can be applied to something multiple times, right? If so would be cool to see an example of that as well.

.github/CHANGELOG.md Show resolved Hide resolved
"""
# pylint: disable=protected-access,too-many-arguments
import numpy as np
from scipy.special import factorial
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I didn't know about this function 😁 (or that it would have applications here1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 actually terrifying now that I realize there is a factorial scaling in here somewhere lol

pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Show resolved Hide resolved
tests/gradients/test_finite_difference.py Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved

def test_autograd(self, order, form, tol):
"""Tests that the output of the finite-difference transform
can be differentiated using autograd, yielding second derivatives."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about where the second derivative comes in since n=1 in the transform. Is it because the base tape is a JacobianTape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's because the device is default.qubit.autograd!

  1. I am writing a cost function that computes the finite-difference gradient of this QNode using this transform.
  2. Since this transform is differentiable, I can differentiate the cost function
  3. When the autodiff framework performs backpropagation, it will end up needing to compute various d tape/d parameters as part of the chain rule. However, since the device is written using autograd, more chain rules are applied to simply get to d state/d parameters!

So it's a mixture of (a) the transform being differentiable, and (b) the device execution being differentiable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, you can differentiate any qfunc(qnode) cost function, to get the gradient d qfunc(qnode)/d parameters. In this case, since qfunc itself is the first derivative, we end up with the second derivative 😆

josh146 and others added 2 commits July 27, 2021 21:15
Co-authored-by: Olivia Di Matteo <2068515+glassnotes@users.noreply.github.com>
@josh146
Copy link
Member Author

josh146 commented Jul 27, 2021

How user-facing is this transform? Is it something that would also be nice to have a QNode transform wrapper for so users don't have to work directly with tapes?

At some point I would like to make it more user-friendly, but at the moment it is purely a port of JacobianTape.jacobian from a method to a transform. The plan is:

  1. The new, differentiable batch_execute will use this transform internally when computing gradients during backpropagation.
  2. Later, wrap this up as a qnode transform so it can also be used by users. However, this is blocked by us putting the QNode transform API on hold!

While there is a parameter n to set the order of the gradient, I mostly noticed n=1 and a few n=2 cases in the tests. It may be valuable to show a larger example; also, the transform can be applied to something multiple times, right? If so would be cool to see an example of that as well.

I have to admit, I planned to only port the code in JacobianTape.jacobian as-is to reduce my workload. It currently only supports first derivatives, and only 2nd-order center and 1st-order forward.

However.... I went a bit down a rabbit hole while porting it, and instead implemented support for arbitrary finite-diff rules 🙁

@josh146
Copy link
Member Author

josh146 commented Jul 28, 2021

but I requested changes because I was often quite confused, and ideally we want to make the code base self-explanatory.

@mariaschuld these were good catches. I've updated the PR accordingly, hopefully it should be easier to follow now!

@josh146 josh146 requested a review from mariaschuld July 28, 2021 12:20
Copy link
Contributor

@glassnotes glassnotes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh146 updated version looks great, new parameter names and examples are really helpful!

... qml.RX(params[2], wires=0)
... qml.expval(qml.PauliZ(0))
... qml.var(qml.PauliZ(0))
>>> tape.trainable_params = {0, 1, 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this, why is it { } instead of [ ] for trainable_params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - trainable_params is a set! Since trainable parameters must be unique, and order doesn't matter

pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
pennylane/gradients/finite_difference.py Outdated Show resolved Hide resolved
pennylane/gradients/finite_difference.py Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
tests/gradients/test_finite_difference.py Outdated Show resolved Hide resolved
Co-authored-by: Olivia Di Matteo <2068515+glassnotes@users.noreply.github.com>
@josh146 josh146 requested a review from glassnotes July 28, 2021 13:49
Copy link
Contributor

@glassnotes glassnotes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good to me! 👍

def finite_diff_coeffs(n, approx, strategy):
r"""Generate the finite difference shift values and corresponding
term coefficients for a given derivative order, approximation accuracy,
and strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also voluntary, but you may make future developer's life sweet and easy if you mention the equation for the coefficients here - at least to me this piece of theory is not obvious.

Copy link
Member Author

@josh146 josh146 Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added the following underneath examples:

image

"""
# TODO: replace the JacobianTape._grad_method_validation
# functionality before deprecation.
diff_methods = tape._grad_method_validation("numeric")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to learn about this - would it return a list like ["F", "F",...]? Isn't at this stage clear that all parameters should do finite-diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method also does two other things:

  • Returns "0" if the output is independent of the parameter
  • Raises an error if the parameter has grad_method=None.

This is purely to retain backwards compatibility with JacobianTape.jacobian. My main goal of this PR was that,

>>> tapes, fn = finite_diff(tape)
>>> j = fn(dev.batch_execute(tapes))

should behave exactly like

>>> j  = tape.jacobian(dev)

does in master 🙂

Copy link
Contributor

@mariaschuld mariaschuld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, the code now explains itself!

The approval is of course conditioned on fixing CodeFactor issues and testing those two new untested lines, but I trust you :)

@josh146 josh146 merged commit a562747 into master Jul 29, 2021
@josh146 josh146 deleted the finite-diff branch July 29, 2021 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review-ready 👌 PRs which are ready for review by someone from the core team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants