Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parameter-binding performance of large instructions #10284

Merged
merged 7 commits into from
Jul 20, 2023

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Jun 14, 2023

Summary

Previously, the parameter-assignment methods of QuantumCircuit had poor performance when an instruction had a complex definition that involved many of the parameters being bound. The strategy of binding each parameter separately led to each definition being copied and rebound multiple times, with each rebinding being recursive all the way down.

This commit makes the definition rebinding happen only once per instruction, and updates the data model used to make it a complete recursion through QuantumCircuit.assign_parameters. This has the side effect of fixing an issue where internal global phases would not be updated.

The algorithmic change that enables this (just rebind the definition at the end) is rather simpler than the length of this patch suggests. This is just because the previous structure of separating out a single _assign_parameter method made it harder to restructure the logic without introducing unpleasant stateful coupling between the driver and helper methods. Instead, I inlined most of the helper functions into the driver body, so we can treat some components of the binding in a per-parameter way and some in a per-operation way, in whatever way is better.

Details and comments

Fix #10282
Fix #10283

As an example, take a setup of

import math
from qiskit.circuit.library import EfficientSU2

qc = EfficientSU2(100, entanglement="linear", reps=100)
qc.measure_all()
ps = {x: math.pi / 2 for x in qc.parameters}

then the binding example

qc.assign_parameters(ps)

(or bind_parameters) took my machine about 7m10s on main, and 1.24s after this PR (so a ~350x speedup). It's probably not quite as fast as the flattening in #10269 because we still have some overhead from dealing with the extra structure, but it's a big general step in a good direction, with (hopefully) no other user-facing implications.

Previously, the parameter-assignment methods of `QuantumCircuit` had
poor performance when an instruction had a complex definition that
involved many of the parameters being bound.  The strategy of binding
each parameter separately led to each definition being copied and
rebound multiple times, with each rebinding being recursive all the way
down.

This commit makes the definition rebinding happen only once per
instruction, and updates the data model used to make it a complete
recursion through `QuantumCircuit.assign_parameters`.  This has the side
effect of fixing an issue where internal global phases would not be
updated.

The algorithmic change that enables this (just rebind the definition at
the end) is rather simpler than the length of this patch suggests.  This
is just because the previous structure of separating out a single
`_assign_parameter` method made it harder to restructure the logic
without introducing unpleasant stateful coupling between the driver and
helper methods.  Instead, I inlined most of the helper functions into
the driver body, so we can treat some components of the binding in a
per-parameter way and some in a per-operation way, in whatever way is
better.
@jakelishman jakelishman added performance Changelog: New Feature Include in the "Added" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog labels Jun 14, 2023
@jakelishman jakelishman added this to the 0.25.0 milestone Jun 14, 2023
@jakelishman jakelishman requested a review from a team as a code owner June 14, 2023 16:40
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed tests in this file are indicative of a problem in the parameter binding of prior internal custom operations. The old tests were enforcing behaviour that shouldn't have been expected, given the current difficulties in looking at parametrised definitions after they've been bound - the p0 and p1 names shouldn't have been visible in the current data model (though in our preferred lazily bound definitions model, they would actually become visible).

Comment on lines 286 to 289
def _rebind_equiv(equiv, query_params):
equiv_params, equiv_circuit = equiv
param_map = dict(zip(equiv_params, query_params))
equiv = equiv_circuit.assign_parameters(param_map, inplace=False)
param_map = dict((x, y) for x, y in zip(equiv_params, query_params) if isinstance(x, Parameter))
equiv = equiv_circuit.assign_parameters(param_map, inplace=False, flat_input=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a logical bug in the EquivalenceLibrary code that previously was being masked by a side-effect of QuantumCircuit._unroll_params_dict - it didn't raise errors on bad parameters in the map, it just silently dropped them. For equivalences that have a mixture of symbolic and concrete parameters (there's at least one explicitly tested), this resulted in passing a map that looked like {Parameter: float, float: float}, which should be a typing error - a float isn't a valid Parameter and can't be bound.

@coveralls
Copy link

coveralls commented Jun 14, 2023

Pull Request Test Coverage Report for Build 5610825454

  • 114 of 115 (99.13%) changed or added relevant lines in 8 files are covered.
  • 15 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.01%) to 86.065%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/circuit/quantumcircuit.py 95 96 98.96%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 1 91.39%
qiskit/circuit/parametertable.py 1 88.19%
qiskit/circuit/quantumcircuit.py 1 94.2%
crates/qasm2/src/parse.rs 12 97.11%
Totals Coverage Status
Change from base Build 5610052878: 0.01%
Covered Lines: 72907
Relevant Lines: 84712

💛 - Coveralls

Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive speedup for nested circuits 👍🏻 But for already unrolled circuits, that need no rebinding, this change seems to be a bit slower than before. We could run some more benchmarks, but for PEC-style circuits I observed the following

Rebinding needed (circuit not manually decomposed)
----------------
main: 321.2s
this PR: 30.3s (10x faster)

Circuit unrolled
----------------
main: 10.4s +- 0.3
this PR: 12.9s +- 0.4 (~20% slower)

Do you have an idea where this slowdown might come from?

qiskit/circuit/quantumcircuit.py Show resolved Hide resolved
@jakelishman
Copy link
Member Author

For the PEC slowdown: thanks for reminding me. I had seen a small slowdown, but not of the same magnitude as you. The reason is likely because the PEC code passes a vector of parameters to bind rather than a dictionary, and there's a bit more overhead in the new code to normalise the input formats. I should be able to fix that.

This reduces several costs in the input normalisation, and makes the
calculation of some properties lazy for the sequence-like inputs; for
many close-to-hardware circuits such as those used in PEC, the sequence
form of parameter input is more natural, and almost no instructions will
have internally parametrised definitions, nor will there be a
parametrised global phase or calibrations.  In these cases, we can avoid
overhead from eagerly normalising the input into the forms that's easier
for these less-common assignment operations to use.

As a side-effect of the abstraction, we can also avoid making several
dictionary copies, and just use the mapping abstraction to filter the
dictionary during iteration on the fly.

This also takes the opportunity to improve the performance of sorting
large vectors of parameters.  In practice, I don't think this had a huge
impact on performance, but in principle it's rather more efficient now
and results in many fewer Python function calls during a sort.
@jakelishman
Copy link
Member Author

jakelishman commented Jul 18, 2023

I've done a little bit of lazy-evaluation trickery to remove the overhead from input normalisation, so nothing gets calculated until it's actually used. Stemming from that abstraction, I was able to remove some overhead from the dictionary input form as well (no need to make a separate filtered dictionary; the filtering is just done as part of the iteration).

I've improved the sort speed dramatically for large lists, though in theory the sort cost should only ever get paid on the first circuit assignment and cached. An oversight in the original form of this PR caused the cache to be defeated, which had a particularly harsh impact on the sequence form (this was probably the main effect that Julien was seeing as a slowdown).

I'm interested to see what Julien's measurements are now. From my side, given two circuits that look like:

import numpy as np
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.circuit.library import ECRGate

# Reduce some copy overhead.
ecr_singleton = ECRGate()

def twirled_circuit(num_qubits, depth):
    qc = QuantumCircuit(num_qubits)
    ps = iter(ParameterVector("theta", 3 * num_qubits * (depth + 1)))
    for _ in range(depth):
        for i in range(num_qubits):
            qc.rz(next(ps), i)
            qc.ry(next(ps), i)
            qc.rz(next(ps), i)
        for a in range(0, num_qubits - 1, 2):
            qc.append(ecr_singleton, [a, a+1], [])
        for a in range(1, num_qubits - 1, 2):
            qc.append(ecr_singleton, [a, a+1], [])
    for i in range(num_qubits):
        qc.rz(next(ps), i)
        qc.ry(next(ps), i)
        qc.rz(next(ps), i)
    return qc

qc = twirled_circuit(100, 1000)
wrapped = QuantumCircuit(qc.num_qubits)
wrapped.append(qc.to_gate(), wrapped.qubits, [])

params = np.random.rand(qc.num_parameters)

I measure these timings ("decomposed" is the flat qc above, "seq" means passing an array of parameter values, "dict" means passing a dictionary of {parameter: value}):

decomposed (seq) decomposed (dict) wrapped (seq) wrapped (dict)
main 4.98(3)s 4.97(3)s > 30 min > 30 min
70f2934 4.58(2)s 4.72(2)s 15.5(1)s 15.5(1)s

I didn't both waiting for main to complete even a single iteration for the wrapped circuit of this size.

The improvements in the binding of the decomposed circuit are relatively slight (we're dominated by circuit copying and the actual ParameterExpression/symengine binding), but there's still a little bit of performance gain there because of improvements in the parameter verification.

Cryoris
Cryoris previously approved these changes Jul 20, 2023
Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's great that this also provides a speedup in the unrolled case now 🙂

Edit: It would be good to document the flat_input and strict arguments in the reno

@Cryoris Cryoris enabled auto-merge July 20, 2023 12:49
@Cryoris Cryoris added this pull request to the merge queue Jul 20, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 20, 2023
@mtreinish mtreinish added this pull request to the merge queue Jul 20, 2023
Merged via the queue into Qiskit:main with commit 7c1b8ee Jul 20, 2023
13 checks passed
@jakelishman jakelishman deleted the faster-rebind branch July 26, 2023 12:44
to24toro pushed a commit to to24toro/qiskit-terra that referenced this pull request Aug 3, 2023
…0284)

* Improve parameter-binding performance of large instructions

Previously, the parameter-assignment methods of `QuantumCircuit` had
poor performance when an instruction had a complex definition that
involved many of the parameters being bound.  The strategy of binding
each parameter separately led to each definition being copied and
rebound multiple times, with each rebinding being recursive all the way
down.

This commit makes the definition rebinding happen only once per
instruction, and updates the data model used to make it a complete
recursion through `QuantumCircuit.assign_parameters`.  This has the side
effect of fixing an issue where internal global phases would not be
updated.

The algorithmic change that enables this (just rebind the definition at
the end) is rather simpler than the length of this patch suggests.  This
is just because the previous structure of separating out a single
`_assign_parameter` method made it harder to restructure the logic
without introducing unpleasant stateful coupling between the driver and
helper methods.  Instead, I inlined most of the helper functions into
the driver body, so we can treat some components of the binding in a
per-parameter way and some in a per-operation way, in whatever way is
better.

* Fix lint

* Reduce overhead from input normalisation

This reduces several costs in the input normalisation, and makes the
calculation of some properties lazy for the sequence-like inputs; for
many close-to-hardware circuits such as those used in PEC, the sequence
form of parameter input is more natural, and almost no instructions will
have internally parametrised definitions, nor will there be a
parametrised global phase or calibrations.  In these cases, we can avoid
overhead from eagerly normalising the input into the forms that's easier
for these less-common assignment operations to use.

As a side-effect of the abstraction, we can also avoid making several
dictionary copies, and just use the mapping abstraction to filter the
dictionary during iteration on the fly.

This also takes the opportunity to improve the performance of sorting
large vectors of parameters.  In practice, I don't think this had a huge
impact on performance, but in principle it's rather more efficient now
and results in many fewer Python function calls during a sort.

* Address Ruff's generator concern

* Add comment on new keyword arguments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: Bugfix Include in the "Fixed" section of the changelog Changelog: New Feature Include in the "Added" section of the changelog performance
Projects
None yet
5 participants