Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully port Split2QUnitaries to rust #13025

Merged
merged 8 commits into from
Sep 9, 2024

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Aug 23, 2024

Summary

This commit builds off of #13013 and the other data model in Rust
infrastructure and migrates the InverseCancellation pass to
operate fully in Rust. The full path of the transpiler pass now never
leaves Rust until it has finished modifying the DAGCircuit. There is
still some python interaction necessary to handle parts of the data
model that are still in Python, mainly for creating UnitaryGate
instances and ParameterExpression for global phase. But otherwise
the entirety of the pass operates in rust now.

This is just a first pass at the migration here, it moves the pass to
use loops in rust. The next steps here are to look at operating
the pass in parallel. There is no data dependency between the
optimizations being done for different gates so we should be able to
increase the throughput of the pass by leveraging multithreading to
handle each gate in parallel. This commit does not attempt
this though, because of the Python dependency and also the data
structures around gates and the dag aren't really setup for
multithreading yet and there likely will need to be some work to
support that.

Details and comments

This PR is based on top of #13013 and as such github shows the entire contents of #13013 in addition to the contents of this PR. To see the contents of this PR you can look at HEAD on this branch, or just look at:
0689b73
Rebased

Part of #12208

@mtreinish mtreinish added performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Aug 23, 2024
@mtreinish mtreinish added this to the 1.3 beta milestone Aug 23, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @enavarro51
  • @Qiskit/terra-core
  • @kevinhartman
  • @levbishop
  • @mtreinish
  • @nkanazawa1989

@coveralls
Copy link

coveralls commented Aug 23, 2024

Pull Request Test Coverage Report for Build 10772500080

Details

  • 98 of 106 (92.45%) changed or added relevant lines in 6 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 89.163%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/split_2q_unitaries.rs 42 44 95.45%
crates/circuit/src/dag_circuit.rs 49 55 89.09%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 7 91.73%
Totals Coverage Status
Change from base Build 10770445393: 0.01%
Covered Lines: 73022
Relevant Lines: 81897

💛 - Coveralls

This commit builds off of Qiskit#13013 and the other data model in Rust
infrastructure and migrates the InverseCancellation pass to
operate fully in Rust. The full path of the transpiler pass now never
leaves Rust until it has finished modifying the DAGCircuit. There is
still some python interaction necessary to handle parts of the data
model that are still in Python, mainly for creating `UnitaryGate`
instances and `ParameterExpression` for global phase. But otherwise
the entirety of the pass operates in rust now.

This is just a first pass at the migration here, it moves the pass to
use loops in rust. The next steps here are to look at operating
the pass in parallel. There is no data dependency between the
optimizations being done for different gates so we should be able to
increase the throughput of the pass by leveraging multithreading to
handle each gate in parallel. This commit does not attempt
this though, because of the Python dependency and also the data
structures around gates and the dag aren't really setup for
multithreading yet and there likely will need to be some work to
support that.

Part of Qiskit#12208
@mtreinish mtreinish changed the title [WIP] Fully port Split2QUnitaries to rust Fully port Split2QUnitaries to rust Aug 30, 2024
Some of the logic inside the Split2QUnitaries pass was updated in a
recently merged PR. This commit makes those changes so the rust
implementation matches the current state of the previous python version.
Copy link
Contributor

@sbrandhsn sbrandhsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of minor comments, apart from these, this LGTM. Do you have some benchmark data? :-)

let nodes: Vec<NodeIndex> = dag.topological_op_nodes()?.collect();
for node in nodes {
if let NodeType::Operation(inst) = &dag.dag[node] {
let qubits = dag.get_qargs(inst.qubits).to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a to_vec here or would it be faster to omit it here?

Copy link
Member Author

@mtreinish mtreinish Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_vec() here is needed to get an owned object here. dag.get_qargs() returns a slice that refers back to qubits owned by dag's qubits list. If I didn't have this the compiler would complain that have I have an immutable reference to dag via qubits when I try to mutate the dag later in replace_on_incoming_qubits. I only added this because the compiler errored when I originally kept it as a slice. This is the same reason I had to collect the node indices into a Vec instead of directly iterating over the nodes.

crates/accelerate/src/split_2q_unitaries.rs Outdated Show resolved Hide resolved
crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
&mut self,
py: Python, // Unused if cache_pygates isn't enabled
node: NodeIndex,
mut insert: F,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does F need to be mutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does because I used FnMut for the the type here. It doesn't have to be mutable if we accepted an Fn for F instead. I just wasn't sure which kind of callables we'd allow longer term and opted to be a bit more conservative in the typing, and having FnMut means the callable is allowed to mutate it's environment. I can change it to Fn in this case because the only callback we have so far doesn't need to be FnMut.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, FnMut is probably better as it allows the user more flexibility, it especially allows users to easily track the changes performed by replace_on_incoming_qubits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in: 2ff94f5

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
We don't need the callback to be mutable currently so relax the trait to
just be `Fn` instead of `FnMut`. If we have a need for a mutable
environment callback in the future we can change this easily enough
without any issues.
Copy link
Contributor

@sbrandhsn sbrandhsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mtreinish
Copy link
Member Author

For benchmarks I didn't do anything too extensive. But I ran this script:

import statistics
import time

from qiskit.circuit.random import random_circuit
from qiskit.transpiler.passes import ConsolidateBlocks, Split2QUnitaries

split_pass = Split2QUnitaries()

times = []
for _ in range(100):
    qc = random_circuit(100, 1000)
    blocked = ConsolidateBlocks()(qc)
    start = time.perf_counter()
    split_pass(blocked)
    stop = time.perf_counter()
    runtime = stop - start
    print(runtime)
    times.append(runtime)

mean_runtime = statistics.geometric_mean(times)
print(f"Mean runtime over 100x 100x1000 random circuits: {mean_runtime} sec.")

with both main and this PR. With main it returned:

Mean runtime over 100x 100x1000 random circuits: 0.22658202673859693 sec.

With this PR it yielded:

Mean runtime over 100x 100x1000 random circuits: 0.15633993219475464 sec.

It's not as much of as speedup as I was hoping for, but still pretty good. Also, this isn't a perfect comparison because it does include the circuit -> dag overhead also I have no idea how many substitutions are actually getting made by this.

@sbrandhsn
Copy link
Contributor

sbrandhsn commented Sep 9, 2024

I have no idea how many substitutions are actually getting made by this.

Probably close to zero, which is the expected case for a reasonable circuit. :-) I think a lot of users would be willing to pay a lot of classical compute if they can save one two-qubit gate without introducing further errors.

@sbrandhsn sbrandhsn added this pull request to the merge queue Sep 9, 2024
@mtreinish
Copy link
Member Author

Lol, it's actually zero because I forgot to unroll 3q or more and also the collect blocks pass, so there are no collected unitaries in the circuit. I'll fix it and rerun the script.

Merged via the queue into Qiskit:main with commit 2ef371a Sep 9, 2024
15 checks passed
@mtreinish mtreinish deleted the use-rust-split2q-unitaries branch September 9, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants