Avoid Python op creation in commutative cancellation #12701

mtreinish · 2024-07-01T17:22:25Z

Summary

This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of #12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of #12459.

Details and comments

~~This is based on top of #12692 and will need to be rebased after #12692 merges. To see the diff of just this PR you can look at the last commit: 5beaad4~~ Rebased on main now that #12692 has merged

qiskit-bot · 2024-07-01T17:22:30Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core
@kevinhartman
@mtreinish

coveralls · 2024-07-01T17:50:31Z

Pull Request Test Coverage Report for Build 9748479901

Details

223 of 319 (69.91%) changed or added relevant lines in 12 files are covered.
100 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.1%) to 89.723%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/operations.rs	57	63	90.48%
crates/circuit/src/circuit_instruction.rs	31	68	45.59%
crates/circuit/src/dag_node.rs	68	121	56.2%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/lex.rs	4	91.6%
crates/qasm2/src/parse.rs	6	97.61%
crates/circuit/src/operations.rs	90	78.8%

Totals
Change from base Build 9747324734:	-0.1%
Covered Lines:	64568
Relevant Lines:	71964

💛 - Coveralls

mtreinish · 2024-07-01T19:58:11Z

With this PR we're getting closer to the performance of 1.1.0, most importantly the QFT benchmarks are no longer timing out with this PR:

Benchmarks that have improved:

| Change   |   Before [9092ee78] <1.1.1^0> |   After [94abf30f]  |   Ratio | Benchmark (Parameter)                                       |
|----------|-------------------------------|---------------------|---------|-------------------------------------------------------------|
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')  |
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')  |
| -        |                          2582 |                1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |

Benchmarks that have stayed the same:

| Change   | Before [9092ee78] <1.1.1^0>   | After [94abf30f]    | Ratio   | Benchmark (Parameter)                                                     |
|----------|-------------------------------|---------------------|---------|---------------------------------------------------------------------------|
|          | 23.2±0.03s                    | 26.4±0.04s          | ~1.14   | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                      |
|          | 23.2±0.04s                    | 25.4±0.03s          | 1.10    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                       |
|          | 20.6±0.06s                    | 21.0±0.03s          | 1.02    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                       |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')  |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')  |
|          | 444                           | 435                 | 0.98    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |
|          | 1607                          | 1483                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')               |
|          | 1622                          | 1488                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')               |
|          | 1622                          | 1488                | 0.92    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')              |

Benchmarks that have got worse:

| Change   | Before [9092ee78] <1.1.1^0>   | After [94abf30f]    |   Ratio | Benchmark (Parameter)                                                         |
|----------|-------------------------------|---------------------|---------|-------------------------------------------------------------------------------|
| +        | 1.77±0.01s                    | 3.97±0.01s          |    2.24 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')            |
| +        | 1.27±0s                       | 2.80±0.01s          |    2.21 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |
| +        | 1.93±0.01s                    | 4.07±0.01s          |    2.11 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')             |
| +        | 97.3±0.9ms                    | 176±4ms             |    1.81 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
| +        | 97.8±0.7ms                    | 176±3ms             |    1.8  | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
| +        | 31.9±0.4ms                    | 57.2±0.9ms          |    1.79 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
| +        | 2.45±0.02s                    | 4.38±0.02s          |    1.79 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
| +        | 97.9±0.4ms                    | 174±2ms             |    1.78 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
| +        | 9.02±0.06ms                   | 15.9±0.3ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
| +        | 9.02±0.06ms                   | 15.9±0.2ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
| +        | 32.2±0.4ms                    | 56.6±0.9ms          |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
| +        | 32.2±0.1ms                    | 56.7±1ms            |    1.76 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
| +        | 9.18±0.2ms                    | 16.0±0.3ms          |    1.74 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
| +        | 1.12±0.01s                    | 1.91±0.01s          |    1.71 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
| +        | 2.96±0.05s                    | 4.60±0.01s          |    1.56 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                          |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

(the sha1 is different because I had it rebased on main locally for testing and the quality changes are unrelated and likely caused by #12453 which merged after 1.1.0)

The next passes causing bottlenecks around gate object creation after this PR are:

BasisTranslator
ConsolidateBlocks
Optimize1qGatesDecomposition (which is covered by Use rust gates for Optimize1QGatesDecomposition #12650)
Collect2qBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.

This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459.

coveralls · 2024-07-02T13:37:30Z

Pull Request Test Coverage Report for Build 9761637478

Details

61 of 136 (44.85%) changed or added relevant lines in 6 files are covered.
27 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.1%) to 89.7%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/operations.rs	15	30	50.0%
crates/circuit/src/circuit_instruction.rs	5	35	14.29%
crates/circuit/src/dag_node.rs	22	52	42.31%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	94.02%
crates/qasm2/src/lex.rs	8	91.35%
crates/qasm2/src/parse.rs	18	96.69%

Totals
Change from base Build 9757893531:	-0.1%
Covered Lines:	64549
Relevant Lines:	71961

💛 - Coveralls

sbrandhsn

This generally looks good to me, thanks! :-) I had one question on handling error messages but apart from this I'd be happy to approve this PR!

crates/circuit/src/circuit_instruction.rs

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

sbrandhsn

LGMT, thanks!

…ta-commutation-passes

coveralls · 2024-07-03T11:29:32Z

Pull Request Test Coverage Report for Build 9776669375

Details

64 of 82 (78.05%) changed or added relevant lines in 5 files are covered.
16 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.008%) to 89.822%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/dag_node.rs	19	22	86.36%
crates/circuit/src/operations.rs	15	30	50.0%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	94.02%
crates/qasm2/src/lex.rs	3	92.11%
crates/qasm2/src/parse.rs	12	97.15%

Totals
Change from base Build 9775174998:	-0.008%
Covered Lines:	65122
Relevant Lines:	72501

💛 - Coveralls

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in #12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after #12459. To that end this builds on the thread of work in the two PRs #12692 and #12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>

This commit updates the BasisTranslator transpiler pass. It builds off of #12692 and #12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of #12459 on the BasisTranslator.

* Avoid Python op creation in commutative cancellation This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459. * Remove stray print * Don't add __array__ to DAGOpNode or CircuitInstruction

* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>

This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.

mtreinish added this to the 1.2.0 milestone Jul 1, 2024

mtreinish requested review from alexanderivrii, ShellyGarion and a team as code owners July 1, 2024 17:22

mtreinish mentioned this pull request Jul 1, 2024

Use rust gates for ConsolidateBlocks #12704

Merged

3 tasks

mtreinish mentioned this pull request Jul 1, 2024

Avoid Python op creation in BasisTranslator #12705

Merged

3 tasks

mtreinish force-pushed the python-access-rust-data-commutation-passes branch from 5beaad4 to fa774b3 Compare July 2, 2024 13:12

mtreinish removed on hold Can not fix yet labels Jul 2, 2024

sbrandhsn reviewed Jul 2, 2024

View reviewed changes

crates/circuit/src/circuit_instruction.rs Outdated Show resolved Hide resolved

sbrandhsn previously approved these changes Jul 3, 2024

View reviewed changes

mtreinish added 3 commits July 3, 2024 06:24

Merge remote-tracking branch 'origin/main' into python-access-rust-da…

548a104

…ta-commutation-passes

Remove stray print

db72c30

Don't add __array__ to DAGOpNode or CircuitInstruction

5f94bf0

mtreinish dismissed sbrandhsn’s stale review via 5f94bf0 July 3, 2024 11:03

mtreinish requested a review from sbrandhsn July 3, 2024 11:04

sbrandhsn approved these changes Jul 3, 2024

View reviewed changes

sbrandhsn added this pull request to the merge queue Jul 3, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 3, 2024

mtreinish added this pull request to the merge queue Jul 3, 2024

Merged via the queue into Qiskit:main with commit 9571ea1 Jul 3, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid Python op creation in commutative cancellation #12701

Avoid Python op creation in commutative cancellation #12701

mtreinish commented Jul 1, 2024 •

edited

Loading

qiskit-bot commented Jul 1, 2024

coveralls commented Jul 1, 2024 •

edited

Loading

mtreinish commented Jul 1, 2024 •

edited

Loading

coveralls commented Jul 2, 2024 •

edited

Loading

sbrandhsn left a comment

sbrandhsn left a comment

coveralls commented Jul 3, 2024 •

edited

Loading

Avoid Python op creation in commutative cancellation #12701

Avoid Python op creation in commutative cancellation #12701

Conversation

mtreinish commented Jul 1, 2024 • edited Loading

Summary

Details and comments

qiskit-bot commented Jul 1, 2024

coveralls commented Jul 1, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9748479901

Details

💛 - Coveralls

mtreinish commented Jul 1, 2024 • edited Loading

coveralls commented Jul 2, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9761637478

Details

💛 - Coveralls

sbrandhsn left a comment

Choose a reason for hiding this comment

sbrandhsn left a comment

Choose a reason for hiding this comment

coveralls commented Jul 3, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9776669375

Details

💛 - Coveralls

mtreinish commented Jul 1, 2024 •

edited

Loading

coveralls commented Jul 1, 2024 •

edited

Loading

mtreinish commented Jul 1, 2024 •

edited

Loading

coveralls commented Jul 2, 2024 •

edited

Loading

coveralls commented Jul 3, 2024 •

edited

Loading