Add caching to the autograd batch interface #1508

josh146 · 2021-08-09T17:08:30Z

Context: In #1501, batch_execute was made differentiable in the Autograd interface using the new qml.gradients subpackage. However, since the new qml.gradients subpackage is differentiable, this allows for out-of-the-box higher-order derivatives as long as the new Autograd interface is recursive.

Thus, not only do we have 3rd order and higher derivatives, we are able to:

Extend the Hessian to finite-differences (previously just supported parameter-shift)
Extend the Hessian to all operations (previously supported just 2-term shift gates)
Extend the Hessian to all measurements (previously supported just expval)

However, the recursive evaluation is not smart; the autodiff frameworks will traverse the recursive structure naively, resulting in redundant evaluations.

This PR is a result of thinking about the following two questions:

How is the performance of the new Autograd batch_execute pipeline compared to master?
How can we keep the recursive evaluation but eliminate redundant device evaluations?

Benchmarking

To test the performance of #1501 vs. master, I ran the following benchmark:

import pennylane as qml
from pennylane import numpy as np
from pennylane.interfaces.batch import execute
import time
from pennylane.interfaces.autograd import AutogradInterface

dev = qml.device("default.qubit", wires=3)


def batch(params):
    with qml.tape.QubitParamShiftTape() as tape1:
        qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2])
        qml.expval(qml.PauliZ(0))

    tape1 = tape1.expand(
        stop_at=lambda obj: not isinstance(obj, qml.measure.MeasurementProcess)
        and dev.supports_operation(obj.name)
    )
    tapes = (tape1,)

    res = execute(tapes, dev, qml.gradients.param_shift, max_diff=2)
    return res[0][0]


def master(params):
    with AutogradInterface.apply(qml.tape.QubitParamShiftTape()) as tape1:
        qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2])
        qml.expval(qml.PauliZ(0))

    tape1 = tape1.expand(
        stop_at=lambda obj: not isinstance(obj, qml.measure.MeasurementProcess)
        and dev.supports_operation(obj.name)
    )
    return tape1.execute(dev)


params = np.ones([4, 3, 3], requires_grad=True)

t_batch = []
t_master = []

for i in range(20):
    start = time.time()
    qml.jacobian(master)(params)
    t_master.append(time.time() - start)

for i in range(20):
    start = time.time()
    qml.jacobian(batch)(params)
    t_batch.append(time.time() - start)

print("batch_execute:\t", np.min(t_batch))
print("master:\t\t", np.min(t_master))

With the following results:

Recursive evaluation turned off (max_diff=1)
--------------------------------------------
batch_execute:   0.04995298385620117
master:          0.04774165153503418

Recursive evaluation turned on (max_diff=2)
---------------------------------------------
batch_execute:   0.06748104095458984
master:          0.046480655670166016

Interestingly:

The batch_execute pipeline is ~ the same speed when the recursive evaluation is turned on.
The batch_execute pipeline is slower when recursive evaluation is turned on.

Description of the changes

A new argument max_diff is added, that allows the user to specify at what 'depth'/'order' the recursive evaluation ends. E.g., setting it to max_diff=1 completely inactivates the recursive evaluation.
Caching is added to the qml.interfaces.execute() function, by way of a decorator. This decorator makes use of tape.hash to identify unique tapes.
- If a tape does not match a hash in the cache, then the tape has not been previously executed. It is executed, and the result added to the cache.
- If a tape matches a hash in the cache, then the tape has been previously executed. The corresponding cached result is extracted, and the tape is not passed to the execution function.
- Finally, there might be the case where one or more tapes in the current set of tapes to be executed share a hash. If this is the case, duplicated are removed, to avoid redundant evaluations.

Benefits

Caching has a significant effect. E.g., consider the benchmarking example above, modified to compute the Hessian and display the number of executions:

params = np.ones([2, 2, 3], requires_grad=True)

qml.jacobian(qml.grad(master))(params)
print("Master: \t\t\tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cache=False)
print("batch_execute (no caching): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=1)
print("batch_execute (cachesize=1): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=20)
print("batch_execute (cachesize=20): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=1000)
print("batch_execute (cachesize=1000): num_exections = ", dev.num_executions)

Master:                         num_exections =  313
batch_execute (no caching):     num_exections =  601
batch_execute (cachesize=1):    num_exections =  576
batch_execute (cachesize=20):   num_exections =  557
batch_execute (cachesize=1000): num_exections =  301

By using a cache, we can reduce the number of evaluations beyond the minimum we currently have in master.

Questions

What should the default value of max_diff be?
The 'smart defaults' depend on the situation - for a remote device, max_diff>1 is probably fine?

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

github-actions · 2021-08-09T17:08:47Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

…nto autograd-caching

trbromley

Thanks @josh146, this is great! 🚀 I've left a few questions for my understanding.

pennylane/interfaces/batch/__init__.py

trbromley · 2021-08-17T21:44:46Z

pennylane/interfaces/batch/__init__.py

+            # disable caching on the forward pass
+            execute_fn = cache_execute(device.batch_execute, cache=None)
+
            # replace the backward gradient computation
-            gradient_fn = device.gradients
+            gradient_fn = cache_execute(
+                device.gradients, cache, pass_kwargs=True, return_tuple=False
+            )


Probably my unfamiliarity with the recent changes, but do we expect to need caching for device-based gradients? I thought this was mainly for parameter shift.

Caching is only needed for device-based gradients if mode="backwards". Backwards mode essentially means:

On the forward pass, only the cost function is computed

The gradients are only requested during backpropagation

This means that there will always be 1 additional eval required -- caching therefore reduces the number of evals by 1 😆

Worth it?

I mean, I'd expect 99% of users to use device gradients with mode="forward".

Sounds good!

Does this supersede #1341?

No this complements it for now 🙂

#1341 added the use_device_state keyword argument which instructs QubitDevice.adjoint_jacobian() to use the existing device state and avoid a redundant forward pass.

When mode="forward", we can pass this option:

execute( tapes, dev, gradient_fn="device", interface="torch", gradient_kwargs={"method": "adjoint_jacobian", "use_device_state": True}, mode="forward" )

pennylane/interfaces/batch/autograd.py

tests/interfaces/test_batch_autograd.py

trbromley · 2021-08-17T21:57:02Z

pennylane/interfaces/batch/__init__.py

+    mode="best",
+    gradient_kwargs=None,
+    cache=True,
+    cachesize=10000,


Do we have an idea of the memory implications of this? 🤔

Assuming you do not pass a cache object manually to the execute function, the cache will be created inside execute. What this means is that - as soon as execute has exited, the cache is out of scope and will be garbage collected by Python.

I am 99.99% sure of this, but don't know how to sanity check 😖

This is from the last time I tried to explore this: #1131 (comment)

Do you have any ideas on how to double check that the cache is deleted after execution?

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

trbromley

Thanks @josh146 for the updates and comments! Looks great 💯

pennylane/interfaces/batch/__init__.py

trbromley · 2021-08-19T20:28:13Z

pennylane/interfaces/batch/__init__.py

+            # disable caching on the forward pass
+            execute_fn = cache_execute(device.batch_execute, cache=None)
+
            # replace the backward gradient computation
-            gradient_fn = device.gradients
+            gradient_fn = cache_execute(
+                device.gradients, cache, pass_kwargs=True, return_tuple=False
+            )


Sounds good!

trbromley · 2021-08-19T20:31:54Z

tests/tape/test_tape.py

+            qml.RX(np.array(a), wires=[0])
+            qml.RY(np.array(b), wires=[1])


Is the np.array() left over from the previous test? Though I guess it doesn't matter because the hash should be the same.

oh this was semi-intentional - I was trying to ensure that the datatype of the parameter doesn't affect hashing 😆

tests/tape/test_tape.py

trbromley · 2021-08-19T20:34:37Z

tests/tape/test_tape.py

+        """Tests that the circuit hash of circuits with single-qubit
+        rotations differing by multiples of 2pi have identical hash"""


Oh wow that's cool, didn't realisze we'd support that!

It's required in order to reduce the number of Hessian evals to the optimum number (I don't think the autodiff frameworks are smart enough to do this cancelling out themselves).

Currently it's hardcoded in for the R and CR gates, but it would be cool to add this as an operation property:

class Rot(Operation): periodicity = [2 * np.pi, 2 * np.pi, 2 * np.pi]

tests/tape/test_tape.py

trbromley · 2021-08-19T20:37:19Z

pennylane/interfaces/batch/__init__.py

+            # disable caching on the forward pass
+            execute_fn = cache_execute(device.batch_execute, cache=None)
+
            # replace the backward gradient computation
-            gradient_fn = device.gradients
+            gradient_fn = cache_execute(
+                device.gradients, cache, pass_kwargs=True, return_tuple=False
+            )


Does this supersede #1341?

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

josh146 and others added 18 commits August 4, 2021 22:38

Added differentiable VJP transform

0c57919

linting

674604b

more tests

688f4a2

linting

9a8476b

add tests

0413307

add comment

6b44284

fix

35e1848

more

67e216a

typos

d0e40f8

Apply suggestions from code review

89bdd8d

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

fixes

f415d9f

Merge branch 'master' into vjp-transform

a4592da

merge

9153c45

add tests

6c5dc72

more tests

11f20b3

renamed

122194c

typo

e98c835

Add caching to the autograd backend

5956967

josh146 changed the title ~~Add caching to the autograd backend~~ [WIP] Add caching to the autograd backend Aug 9, 2021

josh146 and others added 10 commits August 10, 2021 01:13

more

8e3159f

Merge branch 'master' into vjp-transform

3cbfc22

more

b36ec30

more

3bd36bf

more

d644228

caching

81bd371

fix

9a19ce2

fix

44ca01d

fix tests

b4bb9d2

final

102d551

josh146 added 4 commits August 16, 2021 21:26

more

6d77f3e

linting

c1ccb0d

linting

6aebd37

Merge branch 'batch-autograd' into autograd-caching

3e0c909

Base automatically changed from batch-autograd to master August 16, 2021 14:14

josh146 added 2 commits August 16, 2021 22:20

merge master

2f7aeac

linting

c540c53

josh146 requested a review from trbromley August 16, 2021 17:01

josh146 added 5 commits August 17, 2021 17:36

Merge branch 'master' into autograd-caching

5fb9a4b

remove pass

cbbb5f0

Merge branch 'autograd-caching' of github.com:PennyLaneAI/pennylane i…

9e749ef

…nto autograd-caching

Merge branch 'master' into autograd-caching

57b747a

Merge branch 'master' into autograd-caching

029e5bf

josh146 mentioned this pull request Aug 17, 2021

Differentiable batch execute using TensorFlow #1542

Merged

josh146 added 2 commits August 18, 2021 01:25

changelog

64e0dd1

Merge branch 'autograd-caching' of github.com:PennyLaneAI/pennylane i…

b7a58cf

…nto autograd-caching

trbromley reviewed Aug 17, 2021

View reviewed changes

josh146 and others added 3 commits August 18, 2021 13:32

Apply suggestions from code review

77e5df1

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

Update pennylane/interfaces/batch/__init__.py

3d2b9b6

Merge branch 'master' into autograd-caching

34b379b

josh146 mentioned this pull request Aug 18, 2021

Differentiable batch execute using PyTorch #1549

Merged

josh146 added 2 commits August 18, 2021 23:45

Merge branch 'master' into autograd-caching

aec3cc0

Add hashing tests

9e0eb7a

josh146 requested a review from trbromley August 18, 2021 16:38

Merge branch 'master' into autograd-caching

142a662

trbromley approved these changes Aug 19, 2021

View reviewed changes

josh146 and others added 2 commits August 20, 2021 13:29

Merge branch 'master' into autograd-caching

ec0bf60

Apply suggestions from code review

354aec9

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

josh146 merged commit 117599e into master Aug 20, 2021

josh146 deleted the autograd-caching branch August 20, 2021 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caching to the autograd batch interface #1508

Add caching to the autograd batch interface #1508

josh146 commented Aug 9, 2021 •

edited

Loading

github-actions bot commented Aug 9, 2021

trbromley left a comment

trbromley Aug 17, 2021

josh146 Aug 18, 2021

trbromley Aug 19, 2021

trbromley Aug 19, 2021

josh146 Aug 20, 2021

trbromley Aug 17, 2021

josh146 Aug 18, 2021

trbromley left a comment

trbromley Aug 19, 2021

trbromley Aug 19, 2021

josh146 Aug 20, 2021

trbromley Aug 19, 2021

josh146 Aug 20, 2021

trbromley Aug 19, 2021

		qml.RX(np.array(a), wires=[0])
		qml.RY(np.array(b), wires=[1])

		"""Tests that the circuit hash of circuits with single-qubit
		rotations differing by multiples of 2pi have identical hash"""

Add caching to the autograd batch interface #1508

Add caching to the autograd batch interface #1508

Conversation

josh146 commented Aug 9, 2021 • edited Loading

github-actions bot commented Aug 9, 2021

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 commented Aug 9, 2021 •

edited

Loading