Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Python op creation in commutative cancellation #12701

Merged

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Jul 1, 2024

Summary

This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of #12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of #12459.

Details and comments

This is based on top of #12692 and will need to be rebased after #12692 merges. To see the diff of just this PR you can look at the last commit: 5beaad4 Rebased on main now that #12692 has merged

@mtreinish mtreinish added on hold Can not fix yet performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Jul 1, 2024
@mtreinish mtreinish added this to the 1.2.0 milestone Jul 1, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@coveralls
Copy link

coveralls commented Jul 1, 2024

Pull Request Test Coverage Report for Build 9748479901

Details

  • 223 of 319 (69.91%) changed or added relevant lines in 12 files are covered.
  • 100 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.1%) to 89.723%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/operations.rs 57 63 90.48%
crates/circuit/src/circuit_instruction.rs 31 68 45.59%
crates/circuit/src/dag_node.rs 68 121 56.2%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 4 91.6%
crates/qasm2/src/parse.rs 6 97.61%
crates/circuit/src/operations.rs 90 78.8%
Totals Coverage Status
Change from base Build 9747324734: -0.1%
Covered Lines: 64568
Relevant Lines: 71964

💛 - Coveralls

@mtreinish
Copy link
Member Author

mtreinish commented Jul 1, 2024

With this PR we're getting closer to the performance of 1.1.0, most importantly the QFT benchmarks are no longer timing out with this PR:

Benchmarks that have improved:

| Change   |   Before [9092ee78] <1.1.1^0> |   After [94abf30f]  |   Ratio | Benchmark (Parameter)                                       |
|----------|-------------------------------|---------------------|---------|-------------------------------------------------------------|
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')  |
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')  |
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |

Benchmarks that have stayed the same:

| Change   | Before [9092ee78] <1.1.1^0>   | After [94abf30f]    | Ratio   | Benchmark (Parameter)                                                     |
|----------|-------------------------------|---------------------|---------|---------------------------------------------------------------------------|
|          | 23.2±0.03s                    | 26.4±0.04s          | ~1.14   | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                      |
|          | 23.2±0.04s                    | 25.4±0.03s          | 1.10    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                       |
|          | 20.6±0.06s                    | 21.0±0.03s          | 1.02    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                       |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')  |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')  |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |
|          | 1607                          | 1483                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')               |
|          | 1622                          | 1488                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')               |
|          | 1622                          | 1488                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')              |

Benchmarks that have got worse:

| Change   | Before [9092ee78] <1.1.1^0>   | After [94abf30f]    |   Ratio | Benchmark (Parameter)                                                         |
|----------|-------------------------------|---------------------|---------|-------------------------------------------------------------------------------|
| +        | 1.77±0.01s                    | 3.97±0.01s          |    2.24 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')            |
| +        | 1.27±0s                       | 2.80±0.01s          |    2.21 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |
| +        | 1.93±0.01s                    | 4.07±0.01s          |    2.11 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')             |
| +        | 97.3±0.9ms                    | 176±4ms             |    1.81 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
| +        | 97.8±0.7ms                    | 176±3ms             |    1.8  | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
| +        | 31.9±0.4ms                    | 57.2±0.9ms          |    1.79 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
| +        | 2.45±0.02s                    | 4.38±0.02s          |    1.79 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
| +        | 97.9±0.4ms                    | 174±2ms             |    1.78 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
| +        | 9.02±0.06ms                   | 15.9±0.3ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
| +        | 9.02±0.06ms                   | 15.9±0.2ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
| +        | 32.2±0.4ms                    | 56.6±0.9ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
| +        | 32.2±0.1ms                    | 56.7±1ms            |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
| +        | 9.18±0.2ms                    | 16.0±0.3ms          |    1.74 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
| +        | 1.12±0.01s                    | 1.91±0.01s          |    1.71 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
| +        | 2.96±0.05s                    | 4.60±0.01s          |    1.56 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                          |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

(the sha1 is different because I had it rebased on main locally for testing and the quality changes are unrelated and likely caused by #12453 which merged after 1.1.0)

The next passes causing bottlenecks around gate object creation after this PR are:

  1. BasisTranslator
  2. ConsolidateBlocks
  3. Optimize1qGatesDecomposition (which is covered by Use rust gates for Optimize1QGatesDecomposition #12650)
  4. Collect2qBlocks

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 1, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 1, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.
@mtreinish mtreinish force-pushed the python-access-rust-data-commutation-passes branch from 5beaad4 to fa774b3 Compare July 2, 2024 13:12
@mtreinish mtreinish removed on hold Can not fix yet labels Jul 2, 2024
@coveralls
Copy link

coveralls commented Jul 2, 2024

Pull Request Test Coverage Report for Build 9761637478

Details

  • 61 of 136 (44.85%) changed or added relevant lines in 6 files are covered.
  • 27 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.1%) to 89.7%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/operations.rs 15 30 50.0%
crates/circuit/src/circuit_instruction.rs 5 35 14.29%
crates/circuit/src/dag_node.rs 22 52 42.31%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
crates/qasm2/src/lex.rs 8 91.35%
crates/qasm2/src/parse.rs 18 96.69%
Totals Coverage Status
Change from base Build 9757893531: -0.1%
Covered Lines: 64549
Relevant Lines: 71961

💛 - Coveralls

Copy link
Contributor

@sbrandhsn sbrandhsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, thanks! :-) I had one question on handling error messages but apart from this I'd be happy to approve this PR!

crates/circuit/src/circuit_instruction.rs Outdated Show resolved Hide resolved
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 2, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
sbrandhsn
sbrandhsn previously approved these changes Jul 3, 2024
Copy link
Contributor

@sbrandhsn sbrandhsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT, thanks!

@coveralls
Copy link

coveralls commented Jul 3, 2024

Pull Request Test Coverage Report for Build 9776669375

Details

  • 64 of 82 (78.05%) changed or added relevant lines in 5 files are covered.
  • 16 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.008%) to 89.822%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/dag_node.rs 19 22 86.36%
crates/circuit/src/operations.rs 15 30 50.0%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
crates/qasm2/src/lex.rs 3 92.11%
crates/qasm2/src/parse.rs 12 97.15%
Totals Coverage Status
Change from base Build 9775174998: -0.008%
Covered Lines: 65122
Relevant Lines: 72501

💛 - Coveralls

@sbrandhsn sbrandhsn added this pull request to the merge queue Jul 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 3, 2024
@mtreinish mtreinish added this pull request to the merge queue Jul 3, 2024
Merged via the queue into Qiskit:main with commit 9571ea1 Jul 3, 2024
15 checks passed
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 3, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 3, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
jlapeyre added a commit to jlapeyre/qiskit-core that referenced this pull request Jul 8, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
jlapeyre added a commit to jlapeyre/qiskit-core that referenced this pull request Jul 8, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
github-merge-queue bot pushed a commit that referenced this pull request Jul 8, 2024
* Use rust gates for ConsolidateBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in #12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after #12459. To that end this builds on the thread of work in
the two PRs #12692 and #12701 which changed the access patterns for
other passes to minimize eager gate object construction.

* Add rust filter function for DAGCircuit.collect_2q_runs()

* Update crates/accelerate/src/convert_2q_block_matrix.rs

---------

Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 10, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of #12692 and #12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of #12459 on the BasisTranslator.
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Avoid Python op creation in commutative cancellation

This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.

* Remove stray print

* Don't add __array__ to DAGOpNode or CircuitInstruction
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Use rust gates for ConsolidateBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.

* Add rust filter function for DAGCircuit.collect_2q_runs()

* Update crates/accelerate/src/convert_2q_block_matrix.rs

---------

Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants