Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Optimize1qGatesDecomposition length heuristic #6553

Merged
merged 33 commits into from
Jul 7, 2021
Merged

Fix Optimize1qGatesDecomposition length heuristic #6553

merged 33 commits into from
Jul 7, 2021

Conversation

ecpeterson
Copy link
Contributor

Summary

The decision in Optimize1qGatesDecomposition whether to use the re-synthesized gate string or the original gate string is decided purely by length: we prefer whichever of the new string and the old string is shorter. This is a rough approximation to expected infidelity cost of the sequence when both strings are of native gates. An input string with non-native gates might be unnaturally short, hence persist into the output — a bug.

Details and comments

Before:

In [2]: from qiskit import QuantumCircuit
   ...: from qiskit.transpiler.passes import Optimize1qGatesDecomposition
   ...: import numpy as np
   ...: 
   ...: c = QuantumCircuit(1)
   ...: c.h(0)
   ...: c.ry(np.pi/2, 0)
   ...: print(Optimize1qGatesDecomposition(['sx', 'rz'])(c))
     ┌───┐┌─────────┐
q_0: ┤ H ├┤ RY(π/2) ├
     └───┘└─────────┘

After:

In [1]: from qiskit import QuantumCircuit
   ...: from qiskit.transpiler.passes import Optimize1qGatesDecomposition
   ...: import numpy as np
   ...: 
   ...: c = QuantumCircuit(1)
   ...: c.h(0)
   ...: c.ry(np.pi/2, 0)
   ...: print(Optimize1qGatesDecomposition(['sx', 'rz'])(c))
     ┌────┐┌────┐
q_0: ┤ √X ├┤ √X ├
     └────┘└────┘

I also removed some special casing on U3 and single gates that seemed to overlap with this oversight. Happy to reverse that.

@ecpeterson ecpeterson added bug Something isn't working Changelog: Bugfix Include in the "Fixed" section of the changelog labels Jun 9, 2021
@ecpeterson ecpeterson requested a review from mtreinish June 9, 2021 23:44
@ecpeterson ecpeterson requested a review from a team as a code owner June 9, 2021 23:44
Copy link
Member

@ajavadia ajavadia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, two small comments

@levbishop
Copy link
Member

levbishop commented Jun 10, 2021

I wonder if we can't completely remove the check if the new run is shorter. With the changes in #5827 I'm hoping there should be no remaining cases where the resynthesized sequence is ever non-optimal (and if any do remain, I consider it a bug). Maybe for now get the check to raise an exception if the new sequence is longer?

@ecpeterson
Copy link
Contributor Author

ecpeterson commented Jun 10, 2021

I added a branch for a warning, rather than an error. It would be obnoxious for a synthesis flaw to generate suboptimal programs, but probably worse to then prevent a user from running at all.

I'll check out the test failures tomorrow.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM thanks for fixing this. I agree all that logic is out of date with #5827 now since the issues that single gate case were there for have been fixed. This will also probably improve performance a bit since we're not going to do an identity check on each standalone 1q gate. Just a few nits inline, but then I think this good to go.

Also can you update the release note to say you've also fixed #6473 because I believe this will fix that issue too. (basically just add a new bullet point to the release note yaml for the second fix).

@@ -102,7 +84,13 @@ def run(self, dag):
new_circs.append(decomposer._decompose(operator))
if new_circs:
new_circ = min(new_circs, key=len)
if len(run) > len(new_circ) or (single_u3 and new_circ.data[0][0].name != "u3"):
if all(g.name in self.basis for g in run) and len(run) < len(new_circ):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea on what the runtime cost of this additional check is? If it's noticeable I'd rather not add this to just raise a warning because this will be called quite a lot during a transpile call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have data, but I imagine it's negligible relative to the nonnegotiable cost of constructing new_circ: self.basis tends not to be super large, and decomposer has to read all of run anyhow. I'll do a small amount of with/without benchmarking.

@mtreinish mtreinish added this to the 0.18 milestone Jun 10, 2021
@mtreinish mtreinish self-assigned this Jun 10, 2021
@mtreinish
Copy link
Member

Oh I didn't see the test failures before reviewing and just assumed they all passed. We probably should get to the bottom of those. Most I think are fine and we just need to update the tests like test_short_string: https://dev.azure.com/qiskit-ci/qiskit-terra/_build/results?buildId=28674&view=logs&j=8eacdd59-c2e8-5617-75a5-8ed7c854a78f&t=5df005f4-d5f4-5d2d-0307-5b24b15488fc&l=12271 we can just update that to be X as the expected.

But test_euler_decomposition_worse might be an issue because it looks like the output is worse than what we had before:
https://dev.azure.com/qiskit-ci/qiskit-terra/_build/results?buildId=28674&view=logs&j=8eacdd59-c2e8-5617-75a5-8ed7c854a78f&t=5df005f4-d5f4-5d2d-0307-5b24b15488fc&l=12338

@levbishop
Copy link
Member

levbishop commented Jun 10, 2021

But test_euler_decomposition_worse might be an issue because it looks like the output is worse than what we had before

Yeah this is tricky. The Euler Z(phi)X(theta)Z(lambda) decomposition should find an optimal expansion for 0<=theta<=pi. The input circuit has theta=-pi/2 outside that range, so the Euler decomposer includes the extra Z(). It wouldn't be crazy to change the Euler decomposer to check in case the negative theta := -theta allows such a simplification and emit that if its fewer gates.

On the other hand, both of these solutions are optimal in terms of X() gates which is what actually cost something for many hardware implementations.

@ecpeterson
Copy link
Contributor Author

Some of the test problems were legitimate: the slot Optimize1qGatesDecomposition.basis apparently stores functions which map into the basis described by the basis init kwarg — and that kwarg itself gets tossed. So, ... in self.basis didn't do what I expected (or probably what people reading this PR expected).

Fixed that, updated some of the reference circuits in the test suite, but this still isn't quite good to go: test.python.compiler.test_transpiler.TestTranspile.test_transpile_calibrated_nonbasis_gate_on_diff_qubit checks that non-native gates do / don't get rewritten if they don't / do have calibration definitions, but this change makes rewriting more aggressive. Stay tuned.

@mtreinish
Copy link
Member

Some of the test problems were legitimate: the slot Optimize1qGatesDecomposition.basis apparently stores functions which map into the basis described by the basis init kwarg — and that kwarg itself gets tossed. So, ... in self.basis didn't do what I expected (or probably what people reading this PR expected).

Ah good catch, yeah I missed that in my earlier review, self.basis is a list of the 1q decomposer instances for each target euler basis that is usable by the target basis. Feel free to change the attribute name to something more descriptive to avoid confusion in the future, it shouldn't be part of any public api as it's only used internally by the pass and we can probably just change it without worrying about backwards compatibility.

@ecpeterson ecpeterson requested a review from mtreinish June 11, 2021 21:51
ajavadia
ajavadia previously approved these changes Jun 12, 2021
@mtreinish mtreinish added on hold Can not fix yet and removed automerge on hold Can not fix yet labels Jun 12, 2021
@mtreinish
Copy link
Member

mtreinish commented Jun 12, 2021

I ran the quantum volume benchmarks from the asv benchmark suite on this just now and it causes a roughly 30% run time performance regression:

Benchmarks that have got worse:

       before           after         ratio
     [5f6db11d]       [75d1f0b7]
     <main>       <ecpeterson-bugfix/nonnative-1q-heuristic>
+        239±10ms         334±30ms     1.39  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
+       1.55±0.1s        2.05±0.2s     1.32  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
+      14.3±0.1ms      18.9±0.07ms     1.32  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
+        782±40ms         943±80ms     1.21  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
+      17.8±0.3ms       19.8±0.1ms     1.11  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

My assumption is that this is caused by all the times we're iterating over the gates in each run to check for calibrations now. I haven't profiled it yet but I can do that on monday. But until we get to the bottom of this I've removed the automerge label to prevent mergify from automerging this.

@levbishop
Copy link
Member

It would probably be a good idea to add a couple of extra lines to ANGEXP_ZYZ and ANGEXP_PSX for TestOneQubitEulerSpecial to exercise the new code paths for simplifying by flipping the sign of theta.

@ajavadia ajavadia added this to the 0.18 milestone Jul 6, 2021
ecpeterson and others added 3 commits July 7, 2021 05:08
Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>
Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>
@ecpeterson
Copy link
Contributor Author

... ANGEXP_ZYZ and ANGEXP_PSX ...

Good idea. Done.

ajavadia
ajavadia previously approved these changes Jul 6, 2021
@mtreinish
Copy link
Member

I reran benchmarks just now, it looks like the performance regression from ealier has now been resolved and the performance is equivalent to the current main branch:

Benchmarks that have stayed the same:

       before           after         ratio
     [e9f06e45]       [2fdd628f]
     <main>           <ecpeterson-bugfix/nonnative-1q-heuristic>
           failed           failed      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')
           failed           failed      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')
           failed           failed      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')
          5.83±1s          7.16±3s    ~1.23  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')
          1.90±0s       2.23±0.04s    ~1.18  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
        3.60±0.2s       3.98±0.07s    ~1.10  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')
       5.71±0.5ms       6.24±0.6ms     1.09  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')
         33.7±3ms       35.1±0.5ms     1.04  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')
         98.6±4ms          101±5ms     1.03  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')
        720±100ms        739±500ms     1.03  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')
          1.03±0s       1.05±0.06s     1.02  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
       23.5±0.2ms       23.8±0.2ms     1.02  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')
         20.3±1ms      20.4±0.04ms     1.01  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
         344±20ms         319±20ms     0.93  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
       1.95±0.2ms      1.76±0.02ms    ~0.90  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')
         130±80ms         106±60ms    ~0.82  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'synthesis')

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

I'm a bit concerned about that variance, especially as I ran this on my local benchmark system which is doesn't have anything else running on it. But it doesn't look outside the norm for the benchmark (my guess is that the seeding for the qv circuit generation isn't correct or something like that).

@ecpeterson
Copy link
Contributor Author

I also see really strong variance across benchmarking runs, and it sounds worth looking into. I don't think the variance is something I introduced, but who knows what it might be covering up.

mtreinish
mtreinish previously approved these changes Jul 6, 2021
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit inline about docs formatting but it's not worth blocking over as it doesn't get published. I definitely appreciate all the comments added, it makes it easier to trace through things now.

Comment on lines 295 to 308
"""
Installs the angles phi, theta, and lam into a KAK-type decomposition of the form
K(phi) . A(theta) . K(lam) , where K and A are an orthogonal pair drawn from RZGate, RYGate,
and RXGate.

Behavior flags:
`simplify` indicates whether gates should be elided / coalesced where possible.
`allow_non_canonical` indicates whether we are permitted to reverse the sign of the
middle parameter, theta, in the output. When this and `simplify` are both enabled,
we take the opportunity to commute half-rotations in the outer gates past the middle
gate, which permits us to coalesce them at the cost of reversing the sign of theta.

NOTE: The input value of `theta` is expected to lie in [0, pi).
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a private method and neither the docs builds or linters care about docstring formatting on this isn't a big issue. But I wonder if it would be better to restructure this in the normal docstrig format for the napoleon plugin just to be consistent with the rest of the docs. If this were a built and published doc it wouldn't actually pass CI.

@mtreinish mtreinish dismissed levbishop’s stale review July 6, 2021 20:17

Comments addressed

a by-hand attempt at reformatting the docstring
@ecpeterson ecpeterson dismissed stale reviews from mtreinish and ajavadia via c3e7d6c July 6, 2021 20:30
mtreinish
mtreinish previously approved these changes Jul 6, 2021
Copy link
Member

@levbishop levbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a marathon... thanks for sticking with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Changelog: Bugfix Include in the "Fixed" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants