-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parameter-binding performance of large instructions #10284
Conversation
Previously, the parameter-assignment methods of `QuantumCircuit` had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down. This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through `QuantumCircuit.assign_parameters`. This has the side effect of fixing an issue where internal global phases would not be updated. The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single `_assign_parameter` method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better.
One or more of the the following people are requested to review this:
|
test/python/qasm3/test_export.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changed tests in this file are indicative of a problem in the parameter binding of prior internal custom operations. The old tests were enforcing behaviour that shouldn't have been expected, given the current difficulties in looking at parametrised definitions after they've been bound - the p0
and p1
names shouldn't have been visible in the current data model (though in our preferred lazily bound definitions model, they would actually become visible).
qiskit/circuit/equivalence.py
Outdated
def _rebind_equiv(equiv, query_params): | ||
equiv_params, equiv_circuit = equiv | ||
param_map = dict(zip(equiv_params, query_params)) | ||
equiv = equiv_circuit.assign_parameters(param_map, inplace=False) | ||
param_map = dict((x, y) for x, y in zip(equiv_params, query_params) if isinstance(x, Parameter)) | ||
equiv = equiv_circuit.assign_parameters(param_map, inplace=False, flat_input=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a logical bug in the EquivalenceLibrary
code that previously was being masked by a side-effect of QuantumCircuit._unroll_params_dict
- it didn't raise errors on bad parameters in the map, it just silently dropped them. For equivalences that have a mixture of symbolic and concrete parameters (there's at least one explicitly tested), this resulted in passing a map that looked like {Parameter: float, float: float}
, which should be a typing error - a float
isn't a valid Parameter
and can't be bound.
Pull Request Test Coverage Report for Build 5610825454
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive speedup for nested circuits 👍🏻 But for already unrolled circuits, that need no rebinding, this change seems to be a bit slower than before. We could run some more benchmarks, but for PEC-style circuits I observed the following
Rebinding needed (circuit not manually decomposed)
----------------
main: 321.2s
this PR: 30.3s (10x faster)
Circuit unrolled
----------------
main: 10.4s +- 0.3
this PR: 12.9s +- 0.4 (~20% slower)
Do you have an idea where this slowdown might come from?
For the PEC slowdown: thanks for reminding me. I had seen a small slowdown, but not of the same magnitude as you. The reason is likely because the PEC code passes a vector of parameters to bind rather than a dictionary, and there's a bit more overhead in the new code to normalise the input formats. I should be able to fix that. |
This reduces several costs in the input normalisation, and makes the calculation of some properties lazy for the sequence-like inputs; for many close-to-hardware circuits such as those used in PEC, the sequence form of parameter input is more natural, and almost no instructions will have internally parametrised definitions, nor will there be a parametrised global phase or calibrations. In these cases, we can avoid overhead from eagerly normalising the input into the forms that's easier for these less-common assignment operations to use. As a side-effect of the abstraction, we can also avoid making several dictionary copies, and just use the mapping abstraction to filter the dictionary during iteration on the fly. This also takes the opportunity to improve the performance of sorting large vectors of parameters. In practice, I don't think this had a huge impact on performance, but in principle it's rather more efficient now and results in many fewer Python function calls during a sort.
I've done a little bit of lazy-evaluation trickery to remove the overhead from input normalisation, so nothing gets calculated until it's actually used. Stemming from that abstraction, I was able to remove some overhead from the dictionary input form as well (no need to make a separate filtered dictionary; the filtering is just done as part of the iteration). I've improved the sort speed dramatically for large lists, though in theory the sort cost should only ever get paid on the first circuit assignment and cached. An oversight in the original form of this PR caused the cache to be defeated, which had a particularly harsh impact on the sequence form (this was probably the main effect that Julien was seeing as a slowdown). I'm interested to see what Julien's measurements are now. From my side, given two circuits that look like: import numpy as np
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.circuit.library import ECRGate
# Reduce some copy overhead.
ecr_singleton = ECRGate()
def twirled_circuit(num_qubits, depth):
qc = QuantumCircuit(num_qubits)
ps = iter(ParameterVector("theta", 3 * num_qubits * (depth + 1)))
for _ in range(depth):
for i in range(num_qubits):
qc.rz(next(ps), i)
qc.ry(next(ps), i)
qc.rz(next(ps), i)
for a in range(0, num_qubits - 1, 2):
qc.append(ecr_singleton, [a, a+1], [])
for a in range(1, num_qubits - 1, 2):
qc.append(ecr_singleton, [a, a+1], [])
for i in range(num_qubits):
qc.rz(next(ps), i)
qc.ry(next(ps), i)
qc.rz(next(ps), i)
return qc
qc = twirled_circuit(100, 1000)
wrapped = QuantumCircuit(qc.num_qubits)
wrapped.append(qc.to_gate(), wrapped.qubits, [])
params = np.random.rand(qc.num_parameters) I measure these timings ("decomposed" is the flat
I didn't both waiting for The improvements in the binding of the decomposed circuit are relatively slight (we're dominated by circuit copying and the actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, it's great that this also provides a speedup in the unrolled case now 🙂
Edit: It would be good to document the flat_input
and strict
arguments in the reno
…0284) * Improve parameter-binding performance of large instructions Previously, the parameter-assignment methods of `QuantumCircuit` had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down. This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through `QuantumCircuit.assign_parameters`. This has the side effect of fixing an issue where internal global phases would not be updated. The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single `_assign_parameter` method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better. * Fix lint * Reduce overhead from input normalisation This reduces several costs in the input normalisation, and makes the calculation of some properties lazy for the sequence-like inputs; for many close-to-hardware circuits such as those used in PEC, the sequence form of parameter input is more natural, and almost no instructions will have internally parametrised definitions, nor will there be a parametrised global phase or calibrations. In these cases, we can avoid overhead from eagerly normalising the input into the forms that's easier for these less-common assignment operations to use. As a side-effect of the abstraction, we can also avoid making several dictionary copies, and just use the mapping abstraction to filter the dictionary during iteration on the fly. This also takes the opportunity to improve the performance of sorting large vectors of parameters. In practice, I don't think this had a huge impact on performance, but in principle it's rather more efficient now and results in many fewer Python function calls during a sort. * Address Ruff's generator concern * Add comment on new keyword arguments
Summary
Previously, the parameter-assignment methods of
QuantumCircuit
had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down.This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through
QuantumCircuit.assign_parameters
. This has the side effect of fixing an issue where internal global phases would not be updated.The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single
_assign_parameter
method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better.Details and comments
Fix #10282
Fix #10283
As an example, take a setup of
then the binding example
(or
bind_parameters
) took my machine about 7m10s onmain
, and 1.24s after this PR (so a ~350x speedup). It's probably not quite as fast as the flattening in #10269 because we still have some overhead from dealing with the extra structure, but it's a big general step in a good direction, with (hopefully) no other user-facing implications.