Add `DenseLayout` trial to `SabreLayout` #12453

mtreinish · 2024-05-22T13:30:57Z

Summary

Building on the work done in #10829, #10721, and #12104 this commit adds
a new trial to all runs of SabreLayout that runs the dense layout
pass. In general the sabre layout algorithm starts from a random layout
and then runs a routing algorithm to permute that layout virtually where
swaps would be inserted to select a layout that would result in fewer
swaps. As was discussed in #10721 and #10829 that random starting point
is often not ideal especially for larger targets where the distance
between qubits can be quite far. Especially when the circuit qubit count
is low relative to the target's qubit count this can result it poor
layouts as the distance between the qubits is too large. In qiskit we
have an existing pass, DenseLayout, which tries to find the most
densely connected n qubit subgraph of a connectivity graph. This
algorithm necessarily will select a starting layout where the qubits are
near each other and for those large backends where the random starting
layout doesn't work well this can improve the output quality.

As the overhead of DenseLayout is quite low and the core algorithm is
written in rust already this commit adds a default trial that uses
DenseLayout as a starting point on top of the random trials (and any
input starting points). For example if the user specifies to run
SabreLayout with 20 layout trials this will run 20 random trials and
one trial with DenseLayout as the starting point. This is all done
directly in the sabre layout rust code for efficiency. The other
difference between the standalone DenseLayout pass is that in the
standalone pass a sparse matrix is built and a reverse Cuthill-McKee
permutation is run on the densest subgraph qubits to pick the final
layout. This permutation is skipped because in the case of Sabre's
use of dense layout we're relying on the sabre algorithm to perform
the permutation.

Details and comments

Depends on: #12104

This PR is based on top of #12104 and will be manually rebased after that merges. You can look at: 06e208f for the details of just this PR.

qiskit-bot · 2024-05-22T13:31:02Z

One or more of the following people are relevant to this code:

@Eric-Arellano
@Qiskit/terra-core
@kevinhartman
@mtreinish

mtreinish · 2024-05-22T13:34:12Z

I ran some quick asv benchmarks with this:

Benchmarks that have improved:

| Change   |   Before [8e3218bc] <dense-layout-rust-interface~1^2> |   After [2bee114a] <dense-sabre-ftw> |   Ratio | Benchmark (Parameter)                                       |
|----------|-------------------------------------------------------|--------------------------------------|---------|-------------------------------------------------------------|
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')  |
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')  |
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |

Benchmarks that have stayed the same:

| Change   | Before [8e3218bc] <dense-layout-rust-interface~1^2>   | After [2bee114a] <dense-sabre-ftw>   |   Ratio | Benchmark (Parameter)                                                                                  |
|----------|-------------------------------------------------------|--------------------------------------|---------|--------------------------------------------------------------------------------------------------------|
|          | 82.9±0.6ms                                            | 86.1±1ms                             |    1.04 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                               |
|          | 83.1±0.3ms                                            | 86.0±0.9ms                           |    1.03 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                               |
|          | 1.94±0.01s                                            | 1.98±0.02s                           |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                   |
|          | 71.3±0.3ms                                            | 72.6±1ms                             |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                          |
|          | 160±0.8ms                                             | 163±1ms                              |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)        |
|          | 174±2ms                                               | 176±0.2ms                            |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)        |
|          | 80.9±0.4ms                                            | 81.5±0.6ms                           |    1.01 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                |
|          | 290±0.7ms                                             | 292±1ms                              |    1.01 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                               |
|          | 3.31±0s                                               | 3.34±0.01s                           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                   |
|          | 2.77±0.05s                                            | 2.79±0.02s                           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                  |
|          | 936±6ms                                               | 937±3ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                   |
|          | 1.71±0.01s                                            | 1.72±0.01s                           |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                   |
|          | 122±0.5ms                                             | 122±1ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                          |
|          | 157±1ms                                               | 157±2ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                          |
|          | 79.9±0.1ms                                            | 79.9±0.2ms                           |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)        |
|          | 2565                                                  | 2565                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)            |
|          | 1391                                                  | 1391                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)            |
|          | 1258                                                  | 1258                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)            |
|          | 1314                                                  | 1314                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)            |
|          | 2705                                                  | 2705                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                   |
|          | 2005                                                  | 2005                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                   |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                   |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                   |
|          | 2705                                                  | 2705                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0) |
|          | 2005                                                  | 2005                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1) |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2) |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3) |
|          | 323                                                   | 323                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                        |
|          | 304                                                   | 304                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                        |
|          | 307                                                   | 307                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                        |
|          | 268                                                   | 268                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                        |
|          | 9.38±0.1ms                                            | 9.41±0.02ms                          |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                       |
|          | 33.3±0.2ms                                            | 33.2±0.2ms                           |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                           |
|          | 1.24±0.01s                                            | 1.25±0.01s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                   |
|          | 25.7±0.2s                                             | 25.6±0.1s                            |    1    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                    |
|          | 1.43±0.01s                                            | 1.43±0.01s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                      |
|          | 2.17±0.01s                                            | 2.17±0.02s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                      |
|          | 1483                                                  | 1483                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                            |
|          | 1488                                                  | 1488                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                            |
|          | 1488                                                  | 1488                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                           |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                               |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                               |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                              |
|          | 106±0.9ms                                             | 105±1ms                              |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                |
|          | 163±1ms                                               | 161±0.4ms                            |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                          |
|          | 317±3ms                                               | 314±1ms                              |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                               |
|          | 9.49±0.09ms                                           | 9.38±0.09ms                          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                        |
|          | 9.47±0.2ms                                            | 9.41±0.03ms                          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                        |
|          | 104±1ms                                               | 102±0.5ms                            |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                         |
|          | 104±0.4ms                                             | 103±2ms                              |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                        |
|          | 33.5±0.3ms                                            | 33.3±0.2ms                           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                           |
|          | 26.1±0s                                               | 25.8±0s                              |    0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                   |
|          | 1.99±0s                                               | 1.98±0.01s                           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                     |
|          | 360±3ms                                               | 355±2ms                              |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                   |
|          | 171±3ms                                               | 168±3ms                              |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)        |
|          | 104±0.9ms                                             | 102±0.3ms                            |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                         |
|          | 33.6±0.2ms                                            | 33.0±0.3ms                           |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                          |
|          | 23.0±0.08s                                            | 22.5±0.02s                           |    0.98 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                    |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

In this case it shows that at various scale the runtime overhead is neglible but in some cases the quality of output is greatly improved. The extra trial only takes effect in cases where it results in a better outcome.

coveralls · 2024-05-22T14:27:33Z

Pull Request Test Coverage Report for Build 9196491844

Details

26 of 28 (92.86%) changed or added relevant lines in 1 file are covered.
21 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.02%) to 89.596%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/sabre/layout.rs	26	28	92.86%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/lex.rs	4	92.11%
crates/accelerate/src/sabre/layout.rs	17	90.21%

Totals
Change from base Build 9195596815:	-0.02%
Covered Lines:	62334
Relevant Lines:	69572

💛 - Coveralls

henryzou50

LGTM. It's great to see the positive impact of integrating the DenseLayout trial into SabreLayout. While we didn't see improvements in the other benchmarks for depth, the significant reduction in depth for QFT benchmarks is particularly impressive and promising. The negligible runtime overhead further reflects the efficiency of this approach. I'm excited to see how we can build on this improvement and explore other methods to further refine our layout selection strategy.

Building on the work done in Qiskit#10829, Qiskit#10721, and Qiskit#12104 this commit adds a new trial to all runs of `SabreLayout` that runs the dense layout pass. In general the sabre layout algorithm starts from a random layout and then runs a routing algorithm to permute that layout virtually where swaps would be inserted to select a layout that would result in fewer swaps. As was discussed in Qiskit#10721 and Qiskit#10829 that random starting point is often not ideal especially for larger targets where the distance between qubits can be quite far. Especially when the circuit qubit count is low relative to the target's qubit count this can result it poor layouts as the distance between the qubits is too large. In qiskit we have an existing pass, `DenseLayout`, which tries to find the most densely connected n qubit subgraph of a connectivity graph. This algorithm necessarily will select a starting layout where the qubits are near each other and for those large backends where the random starting layout doesn't work well this can improve the output quality. As the overhead of `DenseLayout` is quite low and the core algorithm is written in rust already this commit adds a default trial that uses DenseLayout as a starting point on top of the random trials (and any input starting points). For example if the user specifies to run SabreLayout with 20 layout trials this will run 20 random trials and one trial with `DenseLayout` as the starting point. This is all done directly in the sabre layout rust code for efficiency. The other difference between the standalone `DenseLayout` pass is that in the standalone pass a sparse matrix is built and a reverse Cuthill-McKee permutation is run on the densest subgraph qubits to pick the final layout. This permutation is skipped because in the case of Sabre's use of dense layout we're relying on the sabre algorithm to perform the permutation. Depends on: Qiskit#12104

mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler labels May 22, 2024

mtreinish added this to the 1.2.0 milestone May 22, 2024

mtreinish requested a review from a team as a code owner May 22, 2024 13:30

mtreinish added the Rust This PR or issue is related to Rust code in the repository label May 22, 2024

henryzou50 approved these changes May 22, 2024

View reviewed changes

mtreinish force-pushed the dense-sabre-ftw branch from 06e208f to 921a53c Compare May 22, 2024 18:23

henryzou50 enabled auto-merge May 22, 2024 18:25

henryzou50 added this pull request to the merge queue May 22, 2024

Merged via the queue into Qiskit:main with commit 531f91c May 22, 2024
15 checks passed

mtreinish deleted the dense-sabre-ftw branch May 24, 2024 14:02

This was referenced Jul 1, 2024

Avoid Python op creation in commutative cancellation #12701

Merged

Update transpiler docs #12731

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `DenseLayout` trial to `SabreLayout` #12453

Add `DenseLayout` trial to `SabreLayout` #12453

mtreinish commented May 22, 2024

qiskit-bot commented May 22, 2024

mtreinish commented May 22, 2024 •

edited

Loading

coveralls commented May 22, 2024 •

edited

Loading

henryzou50 left a comment

Add DenseLayout trial to SabreLayout #12453

Add DenseLayout trial to SabreLayout #12453

Conversation

mtreinish commented May 22, 2024

Summary

Details and comments

qiskit-bot commented May 22, 2024

mtreinish commented May 22, 2024 • edited Loading

coveralls commented May 22, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9196491844

Details

💛 - Coveralls

henryzou50 left a comment

Choose a reason for hiding this comment

Add `DenseLayout` trial to `SabreLayout` #12453

Add `DenseLayout` trial to `SabreLayout` #12453

mtreinish commented May 22, 2024 •

edited

Loading

coveralls commented May 22, 2024 •

edited

Loading