Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConsolidateBlocks does not have a good logic for heterogeneous gates #11659

Open
ajavadia opened this issue Jan 28, 2024 · 1 comment
Open

ConsolidateBlocks does not have a good logic for heterogeneous gates #11659

ajavadia opened this issue Jan 28, 2024 · 1 comment
Labels
mod: transpiler Issues and PRs related to Transpiler performance type: feature request New feature or request

Comments

@ajavadia
Copy link
Member

What should we add?

ConsolidateBlocks has some logic for choosing whether to collapse some blocks into a UnitaryGate. But this is pretty outdated by now. It basically checks whether number of gates in the decomposition improves. First, number of gates is not necessarily important, but rather the error is. Second, it does not currently deal with multiple (heterogeneous) possible decompositions.

But all of this is implemented correctly in UnitarySynthesis (at least for 2q blocks). So ConsolidateBlocks should just defer to UnitarySynthesis for when and how to resynthesize a sequence of 2q gates. All of its decomposition considerations should come from UnitarySynthesis.

I think it is better to write a new pass PeepholeUnitaryResynthesis, which does all 3 of these actions: Collect2QBlocks, ConsolidateBlocks, UnitarySynthesis. The logic must be consistent, so there's no point splitting these 3 stages.
I believe this can replace the UnitarySynthesis pass because any Unitary can be considered a simple peephole unitary.

(** note: currently if the user knows that there's a good chance that UnitarySynthesis improves the circuit, they can force it to occur by adding [Collect2QBlocks(target=target), ConsolidateBlocks(force_consolidate=True), UnitarySynthesis(target=target)] to the passmanager, so it is possible to customize this by a user who knows how to use the passmanager)

@mtreinish
Copy link
Member

I like this idea in general. I'm thinking of how it relates to #8774 (specifically the #12007 sub-task) and we can sidestep the need to add a batch mode to the unitary synthesis plugin interface by doing this all at once in multithreaded rust in a new pass.

The only question I have though is in evaluating the error for the original 2q block. I agree that we should use an estimated error heuristic to evaluate a potential decompositions and select one based on that instead of the number of gates (which is just being used a proxy for estimated error rate). But prior to synthesis there isn't a guarantee that the gates in a 2q block are in target instructions that we can query error rates on. How were you thinking we'd evaluate the block in these cases? Because I was reading this is as we we compare the error estimates for the original circuit against all the possible decompositions and pick the one which results in the lower error. I guess the answer is if the block isn't in target native instructions we always need to synthesize so in those cases we pick the lowest error decomposition?

mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 21, 2024
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 28, 2024
This commit adds a new transpiler pass for 2q peephole optimization that is
designed to replace the use of `Collect2qBlocks`, `ConsolidateBlocks`,
and `UnitarySynthesis` in the optimization loop of the transpiler with a
new optimized pass Optimize2qBlocks that performs the same basic
functionality. The goal of this new pass is to be more efficient in
runtime and also enable better quality output. The runtime improvements
are achieved by only crossing the python<->rust boundary once and doing
all the heavy lifting in rust and then just returning a list of circuit
sequences for all 2q blocks and then performing inline substitution for
all of those circuits. The actual computation is then potentially
executed in parallel using rust multithreading. The potential quality
improvement is caused by changing the decomposition selection to be
based on projected error rates instead of an estimated number of 2q
basis gates from the decomposition. In the previous triplet we skipped
synthesis if the estimated number of 2q gates from the default
decomposer was greater than or equal to the 2q gates in the block which
was an attempt to estimate the error rate. In this new pass we compare
the estimated fidelity of all the provided synthesis methods and select
the lowest noise decomposition.

Fixes: Qiskit#11659
Fixes: Qiskit#12007
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 28, 2024
This commit adds a new transpiler pass for 2q peephole optimization that is
designed to replace the use of `Collect2qBlocks`, `ConsolidateBlocks`,
and `UnitarySynthesis` in the optimization loop of the transpiler with a
new optimized pass Optimize2qBlocks that performs the same basic
functionality. The goal of this new pass is to be more efficient in
runtime and also enable better quality output. The runtime improvements
are achieved by only crossing the python<->rust boundary once and doing
all the heavy lifting in rust and then just returning a list of circuit
sequences for all 2q blocks and then performing inline substitution for
all of those circuits. The actual computation is then potentially
executed in parallel using rust multithreading. The potential quality
improvement is caused by changing the decomposition selection to be
based on projected error rates instead of an estimated number of 2q
basis gates from the decomposition. In the previous triplet we skipped
synthesis if the estimated number of 2q gates from the default
decomposer was greater than or equal to the 2q gates in the block which
was an attempt to estimate the error rate. In this new pass we compare
the estimated fidelity of all the provided synthesis methods and select
the lowest noise decomposition.

Fixes: Qiskit#11659
Fixes: Qiskit#12007
@mtreinish mtreinish added performance mod: transpiler Issues and PRs related to Transpiler labels Apr 18, 2024
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Sep 18, 2024
This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see Qiskit#11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

- Doesn't support builds with the py-cache feature (`OnceCell` for the
  cache can't be used across threads)
- Only supports the target
- It doesn't support any synthesizers besides the TwoQubitBasisDecomposer,
  because it's the only one in rust currently.

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for `UnitarySynthesis` only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning. t this point this is
just a rough proof of concept and will need a lof refinement along with
larger changes to Qiskit's rust code before this is ready to merge.

Fixes Qiskit#12007
Fixes Qiskit#11659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod: transpiler Issues and PRs related to Transpiler performance type: feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants