Improve parameter-binding performance of large instructions #10284

jakelishman · 2023-06-14T16:40:45Z

Summary

Previously, the parameter-assignment methods of QuantumCircuit had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down.

This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through QuantumCircuit.assign_parameters. This has the side effect of fixing an issue where internal global phases would not be updated.

The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single _assign_parameter method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better.

Details and comments

Fix #10282
Fix #10283

As an example, take a setup of

import math
from qiskit.circuit.library import EfficientSU2

qc = EfficientSU2(100, entanglement="linear", reps=100)
qc.measure_all()
ps = {x: math.pi / 2 for x in qc.parameters}

then the binding example

qc.assign_parameters(ps)

(or bind_parameters) took my machine about 7m10s on main, and 1.24s after this PR (so a ~350x speedup). It's probably not quite as fast as the flattening in #10269 because we still have some overhead from dealing with the extra structure, but it's a big general step in a good direction, with (hopefully) no other user-facing implications.

Previously, the parameter-assignment methods of `QuantumCircuit` had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down. This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through `QuantumCircuit.assign_parameters`. This has the side effect of fixing an issue where internal global phases would not be updated. The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single `_assign_parameter` method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better.

qiskit-bot · 2023-06-14T16:40:51Z

One or more of the the following people are requested to review this:

@Cryoris
@Qiskit/terra-core
@ajavadia

jakelishman · 2023-06-14T16:43:08Z

test/python/qasm3/test_export.py

The changed tests in this file are indicative of a problem in the parameter binding of prior internal custom operations. The old tests were enforcing behaviour that shouldn't have been expected, given the current difficulties in looking at parametrised definitions after they've been bound - the p0 and p1 names shouldn't have been visible in the current data model (though in our preferred lazily bound definitions model, they would actually become visible).

jakelishman · 2023-06-14T16:46:47Z

qiskit/circuit/equivalence.py

 def _rebind_equiv(equiv, query_params):
    equiv_params, equiv_circuit = equiv
-    param_map = dict(zip(equiv_params, query_params))
-    equiv = equiv_circuit.assign_parameters(param_map, inplace=False)
+    param_map = dict((x, y) for x, y in zip(equiv_params, query_params) if isinstance(x, Parameter))
+    equiv = equiv_circuit.assign_parameters(param_map, inplace=False, flat_input=True)


This was a logical bug in the EquivalenceLibrary code that previously was being masked by a side-effect of QuantumCircuit._unroll_params_dict - it didn't raise errors on bad parameters in the map, it just silently dropped them. For equivalences that have a mixture of symbolic and concrete parameters (there's at least one explicitly tested), this resulted in passing a map that looked like {Parameter: float, float: float}, which should be a typing error - a float isn't a valid Parameter and can't be bound.

coveralls · 2023-06-14T17:07:17Z

Pull Request Test Coverage Report for Build 5610825454

114 of 115 (99.13%) changed or added relevant lines in 8 files are covered.
15 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.01%) to 86.065%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/circuit/quantumcircuit.py	95	96	98.96%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/lex.rs	1	91.39%
qiskit/circuit/parametertable.py	1	88.19%
qiskit/circuit/quantumcircuit.py	1	94.2%
crates/qasm2/src/parse.rs	12	97.11%

Totals
Change from base Build 5610052878:	0.01%
Covered Lines:	72907
Relevant Lines:	84712

💛 - Coveralls

Cryoris

This is a massive speedup for nested circuits 👍🏻 But for already unrolled circuits, that need no rebinding, this change seems to be a bit slower than before. We could run some more benchmarks, but for PEC-style circuits I observed the following

Rebinding needed (circuit not manually decomposed)
----------------
main: 321.2s
this PR: 30.3s (10x faster)

Circuit unrolled
----------------
main: 10.4s +- 0.3
this PR: 12.9s +- 0.4 (~20% slower)

Do you have an idea where this slowdown might come from?

qiskit/circuit/quantumcircuit.py

jakelishman · 2023-06-28T09:29:23Z

For the PEC slowdown: thanks for reminding me. I had seen a small slowdown, but not of the same magnitude as you. The reason is likely because the PEC code passes a vector of parameters to bind rather than a dictionary, and there's a bit more overhead in the new code to normalise the input formats. I should be able to fix that.

This reduces several costs in the input normalisation, and makes the calculation of some properties lazy for the sequence-like inputs; for many close-to-hardware circuits such as those used in PEC, the sequence form of parameter input is more natural, and almost no instructions will have internally parametrised definitions, nor will there be a parametrised global phase or calibrations. In these cases, we can avoid overhead from eagerly normalising the input into the forms that's easier for these less-common assignment operations to use. As a side-effect of the abstraction, we can also avoid making several dictionary copies, and just use the mapping abstraction to filter the dictionary during iteration on the fly. This also takes the opportunity to improve the performance of sorting large vectors of parameters. In practice, I don't think this had a huge impact on performance, but in principle it's rather more efficient now and results in many fewer Python function calls during a sort.

jakelishman · 2023-07-18T23:20:20Z

I've done a little bit of lazy-evaluation trickery to remove the overhead from input normalisation, so nothing gets calculated until it's actually used. Stemming from that abstraction, I was able to remove some overhead from the dictionary input form as well (no need to make a separate filtered dictionary; the filtering is just done as part of the iteration).

I've improved the sort speed dramatically for large lists, though in theory the sort cost should only ever get paid on the first circuit assignment and cached. An oversight in the original form of this PR caused the cache to be defeated, which had a particularly harsh impact on the sequence form (this was probably the main effect that Julien was seeing as a slowdown).

I'm interested to see what Julien's measurements are now. From my side, given two circuits that look like:

import numpy as np
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.circuit.library import ECRGate

# Reduce some copy overhead.
ecr_singleton = ECRGate()

def twirled_circuit(num_qubits, depth):
    qc = QuantumCircuit(num_qubits)
    ps = iter(ParameterVector("theta", 3 * num_qubits * (depth + 1)))
    for _ in range(depth):
        for i in range(num_qubits):
            qc.rz(next(ps), i)
            qc.ry(next(ps), i)
            qc.rz(next(ps), i)
        for a in range(0, num_qubits - 1, 2):
            qc.append(ecr_singleton, [a, a+1], [])
        for a in range(1, num_qubits - 1, 2):
            qc.append(ecr_singleton, [a, a+1], [])
    for i in range(num_qubits):
        qc.rz(next(ps), i)
        qc.ry(next(ps), i)
        qc.rz(next(ps), i)
    return qc

qc = twirled_circuit(100, 1000)
wrapped = QuantumCircuit(qc.num_qubits)
wrapped.append(qc.to_gate(), wrapped.qubits, [])

params = np.random.rand(qc.num_parameters)

I measure these timings ("decomposed" is the flat qc above, "seq" means passing an array of parameter values, "dict" means passing a dictionary of {parameter: value}):

	decomposed (seq)	decomposed (dict)	wrapped (seq)	wrapped (dict)
`main`	4.98(3)s	4.97(3)s	> 30 min	> 30 min
`70f2934`	4.58(2)s	4.72(2)s	15.5(1)s	15.5(1)s

I didn't both waiting for main to complete even a single iteration for the wrapped circuit of this size.

The improvements in the binding of the decomposed circuit are relatively slight (we're dominated by circuit copying and the actual ParameterExpression/symengine binding), but there's still a little bit of performance gain there because of improvements in the parameter verification.

Cryoris

LGTM, it's great that this also provides a speedup in the unrolled case now 🙂

Edit: It would be good to document the flat_input and strict arguments in the reno

…0284) * Improve parameter-binding performance of large instructions Previously, the parameter-assignment methods of `QuantumCircuit` had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down. This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through `QuantumCircuit.assign_parameters`. This has the side effect of fixing an issue where internal global phases would not be updated. The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single `_assign_parameter` method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better. * Fix lint * Reduce overhead from input normalisation This reduces several costs in the input normalisation, and makes the calculation of some properties lazy for the sequence-like inputs; for many close-to-hardware circuits such as those used in PEC, the sequence form of parameter input is more natural, and almost no instructions will have internally parametrised definitions, nor will there be a parametrised global phase or calibrations. In these cases, we can avoid overhead from eagerly normalising the input into the forms that's easier for these less-common assignment operations to use. As a side-effect of the abstraction, we can also avoid making several dictionary copies, and just use the mapping abstraction to filter the dictionary during iteration on the fly. This also takes the opportunity to improve the performance of sorting large vectors of parameters. In practice, I don't think this had a huge impact on performance, but in principle it's rather more efficient now and results in many fewer Python function calls during a sort. * Address Ruff's generator concern * Add comment on new keyword arguments

jakelishman added performance Changelog: New Feature Include in the "Added" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog labels Jun 14, 2023

jakelishman added this to the 0.25.0 milestone Jun 14, 2023

jakelishman requested a review from a team as a code owner June 14, 2023 16:40

jakelishman commented Jun 14, 2023

View reviewed changes

Fix lint

0853f1a

david-alber mentioned this pull request Jun 19, 2023

updated QNN tutorials to demonstrate usage of QNNCircuit class qiskit-community/qiskit-machine-learning#664

Merged

Cryoris reviewed Jun 28, 2023

View reviewed changes

qiskit/circuit/quantumcircuit.py Show resolved Hide resolved

jakelishman mentioned this pull request Jul 4, 2023

Primitives should unroll boxed circuits for performance #9653

Closed

Cryoris linked an issue Jul 7, 2023 that may be closed by this pull request

Primitives should unroll boxed circuits for performance #9653

Closed

mtreinish assigned Cryoris Jul 10, 2023

jakelishman added 2 commits July 19, 2023 00:09

Merge remote-tracking branch 'ibm/main' into faster-rebind

5404045

jakelishman requested review from woodsp-ibm and ikkoham as code owners July 18, 2023 23:16

jakelishman added 2 commits July 19, 2023 00:43

Address Ruff's generator concern

a776440

Merge remote-tracking branch 'ibm/main' into faster-rebind

28f542c

Cryoris previously approved these changes Jul 20, 2023

View reviewed changes

Add comment on new keyword arguments

002863d

jakelishman dismissed Cryoris’s stale review via 002863d July 20, 2023 12:13

Cryoris approved these changes Jul 20, 2023

View reviewed changes

Cryoris enabled auto-merge July 20, 2023 12:49

Cryoris added this pull request to the merge queue Jul 20, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 20, 2023

mtreinish added this pull request to the merge queue Jul 20, 2023

Merged via the queue into Qiskit:main with commit 7c1b8ee Jul 20, 2023
13 checks passed

jakelishman deleted the faster-rebind branch July 26, 2023 12:44

jakelishman mentioned this pull request Aug 2, 2023

Optimise QuantumCircuit.assign_parameters for single-parameter binding #10548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parameter-binding performance of large instructions #10284

Improve parameter-binding performance of large instructions #10284

jakelishman commented Jun 14, 2023 •

edited

Loading

qiskit-bot commented Jun 14, 2023

jakelishman Jun 14, 2023

jakelishman Jun 14, 2023

coveralls commented Jun 14, 2023 •

edited

Loading

Cryoris left a comment

jakelishman commented Jun 28, 2023

jakelishman commented Jul 18, 2023 •

edited

Loading

Cryoris left a comment •

edited

Loading

Improve parameter-binding performance of large instructions #10284

Improve parameter-binding performance of large instructions #10284

Conversation

jakelishman commented Jun 14, 2023 • edited Loading

Summary

Details and comments

qiskit-bot commented Jun 14, 2023

jakelishman Jun 14, 2023

Choose a reason for hiding this comment

jakelishman Jun 14, 2023

Choose a reason for hiding this comment

coveralls commented Jun 14, 2023 • edited Loading

Pull Request Test Coverage Report for Build 5610825454

💛 - Coveralls

Cryoris left a comment

Choose a reason for hiding this comment

jakelishman commented Jun 28, 2023

jakelishman commented Jul 18, 2023 • edited Loading

Cryoris left a comment • edited Loading

Choose a reason for hiding this comment

jakelishman commented Jun 14, 2023 •

edited

Loading

coveralls commented Jun 14, 2023 •

edited

Loading

jakelishman commented Jul 18, 2023 •

edited

Loading

Cryoris left a comment •

edited

Loading