Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter instances can compare equal while having different hash values #11654

Closed
wshanks opened this issue Jan 26, 2024 · 14 comments · Fixed by #11652
Closed

Parameter instances can compare equal while having different hash values #11654

wshanks opened this issue Jan 26, 2024 · 14 comments · Fixed by #11652
Labels
bug Something isn't working

Comments

@wshanks
Copy link
Contributor

wshanks commented Jan 26, 2024

Environment

  • Qiskit Terra version: 0.45.2
  • Python version: 3.12
  • Operating system: Fedora Linux 39

What is happening?

Currently, two Parameter instances can compare equal while not having the same hash value. This condition violates the Python data model which is documented as requiring:

The only required property is that objects which compare equal have the same hash value

How can we reproduce the issue?

a = Parameter("a")
b = Parameter("b", uuid=a._uuid)

a == b  # True
hash(a) == hash(b)  # False

What should happen?

Here there is a decision to make. Parameters are essentially a tuple of a name string and a Python UUID. Currently, __eq__ uses only the UUID while __hash__ uses the name and the UUID. The two methods should use the same data.

In #11647 (comment), @jakelishman mentioned that the documentation states that the name is required for equality, so I created #11652 to add the name to the equality check (I think @jakelishman was referring to "Setting the uuid field when creating two parameters to the same thing (along with the same name) allows them to be equal." in the Parameter.__init__ docstring). However, in the pulse module, there is code that makes use of the fact that parameter names are not part of the equality test by redefining the name:

.. rubric:: Program Scoping
When you call a subroutine from another subroutine, or append a schedule block
to another schedule block, the management of references and parameters
can be a hard task. Schedule block offers a convenient feature to help with this
by automatically scoping the parameters and subroutines.
.. code-block::
from qiskit import pulse
from qiskit.circuit.parameter import Parameter
amp1 = Parameter("amp")
with pulse.build() as sched1:
pulse.play(pulse.Constant(100, amp1), pulse.DriveChannel(0))
print(sched1.scoped_parameters())
.. parsed-literal::
(Parameter(root::amp),)
The :meth:`~ScheduleBlock.scoped_parameters` method returns all :class:`~.Parameter`
objects defined in the schedule block. The parameter name is updated to reflect
its scope information, i.e. where it is defined.
The outer scope is called "root". Since the "amp" parameter is directly used
in the current builder context, it is prefixed with "root".
Note that the :class:`Parameter` object returned by :meth:`~ScheduleBlock.scoped_parameters`
preserves the hidden `UUID`_ key, and thus the scoped name doesn't break references
to the original :class:`Parameter`.

So we have to decide which way to go:

  1. Remove name from the hash calculation so that only the UUID matters for Parameter instances. In practice, this is probably fine since the UUID's should always be unique.
  2. Add name to the equality test and rework the pulse code. It is just using the renaming for convenience of labeling parameters. Perhaps the labeling could be done differently (scoped_parameters could return a dict with the keys being the scoped parameter names?). I don't see scoped_parameters used anywhere here or qiskit-experiments currently.

I lean toward option 2. I think a key question is how tied is Parameter to ParameterExpression (technically it is a direct subclass so very tied 🙂). If Parameter just needs to be a variable that floats around through code objects and can later be assigned a value, allowing different names seems fine. However, if it is important that a Parameter can always be used in an expression and then later substituted for a different value, allowing different names is a problem because ParameterExpression maps the names directly to sympy/symengine symbols and wants to substitute them using the sympy/symengine mechanism. If Parameters are being passed around and manipulated, you could end up calling assign with a Parameter instance with a different name from the one that was used to create the underlying symbolic expression of a ParameterExpression. Or you could say that both use cases are allowed -- when doing simple value substitution renaming Parameters is okay; when doing symbolic expressions, you have to be careful not to reuse UUID's with different names.

@nkanazawa1989 What do you think as the author of the scoped_parameters code?

Any suggestions?

No response

@ihincks
Copy link
Contributor

ihincks commented Jan 26, 2024

Doesn't something like 1. need to happen in any case because of the Python data model violation? I'm having trouble imagining a case where implementing 1. would be worse than the current situation. Of course, this doesn't imply that 1. alone is the optimal route here.

@wshanks
Copy link
Contributor Author

wshanks commented Jan 26, 2024

Sorry, I should have started 2 by saying "Add name to the equality test and rework the pulse code" (edited it to say that now). The choice is either make both eq and hash use only UUID or make them both use UUID and name.

@ihincks
Copy link
Contributor

ihincks commented Jan 26, 2024

Ah okay, that makes more sense, thanks for clarifying.

@jakelishman
Copy link
Member

Ah yeah, that's not ideal about the pulse code. I'd also lean in favour of 2, since it's the documented behaviour and imo the less surprising version of equality.

With the public uuid getter in place, if Pulse wants to continue using its current system, maybe it could mechanically change all the equality checks to a.uuid == b.uuid and it'd generally work?

@wshanks
Copy link
Contributor Author

wshanks commented Jan 27, 2024

One thing that is interesting is that the pulse code actually suffers from this bug. The ScheduleBlock docstring says that

Note that the Parameter object returned by scoped_parameters() preserves the hidden UUID key, and thus the scoped name doesn’t break references to the original Parameter.

However, an important use case of parameter referencing is broken as demonstrated by this code:

from qiskit import pulse
from qiskit.circuit import Parameter


f = Parameter("f")
with pulse.build() as sched:
    pulse.set_frequency(f, pulse.DriveChannel(0))


scoped_f = next(iter(sched.scoped_parameters()))

print(f"{f=}")
print(f"{scoped_f=}")

print(f"{sched=}")
print(f"assign f to 5: {sched.assign_parameters({f: 5}, inplace=False)}")
print(f"assign scoped_f to 5: {sched.assign_parameters({scoped_f: 5}, inplace=False)}")

which prints

f=Parameter(f)
scoped_f=Parameter(root::f)
sched=ScheduleBlock(SetFrequency(f, DriveChannel(0)), name="block0", transform=AlignLeft())
assign f to 5: ScheduleBlock(SetFrequency(5, DriveChannel(0)), name="block0", transform=AlignLeft())
assign scoped_f to 5: ScheduleBlock(SetFrequency(f, DriveChannel(0)), name="block0", transform=AlignLeft())

Because the parameter assignment code uses dicts and sets to manage the parameters, the fact that f and scoped_f do not have the same hash causes the assignment with scoped_f to fail because the test of whether scoped_f is in sched's parameters fails.

I'd also lean in favour of 2, since it's the documented behaviour and imo the less surprising version of equality.

To be fair to the pulse code, the documentation I referenced above was only added in 0.45.0 in #10875. I am not sure if there is anything else that says that a Parameter compares with UUID and name rather than just UUID? The behavior of hash using the name and UUID only started with that same change as well. Prior to that, only the UUID was used for the hash. The motivation for that change was to make Parameter and ParameterExpression instances that compare equal also hash to the same value. That motivation pushes towards treating Parameters as elements of expressions rather than simple values to be substituted (going back to the decision I mentioned before).

The problem with the pulse code is that its tests all create a schedule with parameters and then check the scoped parameters for equality with the original parameters. None of them tests try to assign a value to a schedule using the scoped parameter. Otherwise, this issue would have been caught when reviewing #10875.

Given the need to align Parameter with ParameterExpression, I think we need to add the name to the equality test of Parameter. That means that 0.45 broke the intended use case for ScheduleBlock.scoped_parameters. Maybe the best option is what you suggested just above of just living with this and having pulse code check parameters by UUID if they want to match them up. I think we could transition to a different format for the scoped parameters. Instead of returning tuple[Parameter(scoped_name), ...], it could return dict[scoped_name, Parameter(original_name)]. We could use a keyword argument to select the output and change the default format to the new one in a later release if we wanted to keep backwards compatibility. I think the scoped parameters is a niche enough use case and the current behavior is difficult to work with enough that I wouldn't rule out just switching the return format now and treating that as a bug fix, since I am not sure there is a good fix without breaking compatibility.

@wshanks
Copy link
Contributor Author

wshanks commented Jan 27, 2024

I thought of one way to fix the pulse code with equality testing name and UUID. We could add a new ScopedParameter subclass of Parameter that keeps the _name and _uuid from Parameter for __eq__ and __hash__ but also stores a _scope attribute which it uses to override the name property. I think this is overkill (I think just returning a dict fits the need for scoped_parameters without adding a new class) and is still vulnerable to code assuming that name matches the symbol in symbolic expressions, but it would fit the intended current API behavior.

@jakelishman
Copy link
Member

I'm always a bit nervous around introducing subclasses on objects that weren't intended to have them, because it introduces an extra layer that can get out of sync. For example here, it'd be easy for us to have the hash computer (accidentally or otherwise) from within Parameter using the _name attribute, but ofc that would be out of sync with a subclass - the point being that subclassing and overriding name breaks the assumption that Parameter is allowed to make within its own code for performance that _name directly backs name. That said, the size of the change is small, and Parameter is immutable, so the chances of a subclass breaking are fairly slim (and also, I won't stand in the way of how pulse wants to work internally).

For #10875 - yeah, I wasn't blaming Pulse, just trying to find the most consistent path. I hadn't realised the name/UUID equality doc got added in that PR, but I/we should have spotted the __eq__ problem at that point.

@nkanazawa1989
Copy link
Contributor

Indeed I was trying to introduce another parameter class (not subclass) designed for the pulse module when we investigated the design of the calibration module in Qiskit Experiments. Pulse, or calibration template, doesn't allow name conflict within the same context, and the name must be a unique property in this domain while circuit does allow the name conflict. This made me think the machinery with UUID was not necessary in the pulse domain. Unfortunately, we didn't adopt the idea, because of an issue in the parameter linking between a circuit and pulse program via the pulse gate framework.

I think the nature of pulse parameter comes from the conventional design of the calibration software -- we start from a configuration object where a plan text stores all system information. Since we have several conventional names, e.g. "amp" representing the hight of the waveform, we usually append the scope information to the parameter name, and the scope usually means the pulse kind and in which channel it is played.

We preferred the convenience of parameter link, and decided to push the complexity arising from the representation difference to the ScheduleBlock itself. I feel the current implementation is indeed too complicated to maintain, and I'd love to have another (more pulse oriented parameter) class and simplify the implementation of the schedule block. This could break the parameter link in the pulse gate because it might be difficult to automatically create mapping between different classes. On the other hand, recently I think the pulse gate is bit outdated. Considering what we are looking into, namely, writing a circuit with more than 100 qubit width, this representation will quickly become inefficient and we need more efficient mechanism.

With the refactoring in the pulse feature branch, I'd like to support the qe-compiler PulseDialect output in which they don't have the pulse gate representation. This means we introduce clear separation between circuit and pulse. Something like glueing the circuit transpiler and pulse compiler together with a IR-conversion pass which is effectively a scheduler. In this world, no pulse gate exist, and representation is much robust and scalable.

Anyways, I'm also fine with dropping the scope feature from the ScheduleBlock in 1.0 if this is really problematic (IIRC Qiskit Experiments doesn't use this feature and implements own parameter management mechanism).

@wshanks
Copy link
Contributor Author

wshanks commented Jan 30, 2024

Thanks, @nkanazawa1989. Yes, I agree with the direction you outline. To summarize, I think scoped_parameters is little used and is somewhat broken in the current context of Qiskit (it worked before 0.45). It is also not aligned with where we want to go strategically with pulse programming. What I want to resolve here is what to do about it, ideally for Qiskit 1.0.

Let's assume that we accept the code currently in #11652 (Parameter __eq__ and __hash__ use both name and UUID). Then some options are:

  1. Remove scoped_parameters. We could deprecate in 0.46 and remove in 1.0, right?
  2. Leave scoped_parameters unchanged. Change the tests to compare UUID instead of Parameter equality as @jakelishman suggested. This is easiest but scoped_parameters is almost useless because the parameters can not be used for assignment. The documentation should cover this.
  3. Change scoped_parameters to return a dictionary mapping scope to the original parameters. This would make it possible to look up parameters by scope but would be a breaking API change.
  4. Add a new subclass like I suggested. I share @jakelishman's concern that other code could use _name and name interchangeably and run into confusing bugs.

@nkanazawa1989 Do you prefer 1?

@nkanazawa1989
Copy link
Contributor

Yes. I can quickly approve a deprecation PR to 0.46 branch if you can prepare before the deadline.

wshanks added a commit to wshanks/qiskit-terra that referenced this issue Jan 31, 2024
As noted in Qiskit#11654, parameter
scoping does not work properly beginning with Qiskit 0.45.0. The problem
is that the `Parameter` name is now used for the hash value and so the
parameter produced with the scoped name is not equivalent to the
original parameter and does not substitute for it when using parameter
assignment. Since the scoped parameter mechanism was a convenience
feature and not widely used, it is simply deprecated rather than made to
work with parameter assignment.
@wshanks
Copy link
Contributor Author

wshanks commented Jan 31, 2024

I opened #11654 to deprecate the scoping code. Next I will open a PR to main to remove it along with its tests which are blocking #11652. Then #11652 can close this issue. I know there is a deadline of Feb. 1 for some things. Please let me know if that is a problem for this plan, where the main goal is getting the fix for the issue here into 1.0 (I think it's okay to miss the rc as long as it makes the final release?).

@jakelishman
Copy link
Member

Anything that misses rc1 should really trigger us to create an rc2 (though in practice I don't think it will unless it's critical), so it really should be a strategy of last resort.

I can briefly check a pulse PR before Naoki comes back online, but it'll be best if he does final review once he's back. After that, #11652 should be trivial and I'll throw it in the merge queue.

@nkanazawa1989
Copy link
Contributor

hehe I'm still online. Just approved but #11691 still needs lint fix. @jakelishman can you take over and approve it? The PR looks fine with me.

@jakelishman
Copy link
Member

np, thanks Naoki - yeah, I'll do it in a bit.

wshanks added a commit to wshanks/qiskit-terra that referenced this issue Jan 31, 2024
The `scoped_parameters` and `search_parameters` methods have been
removed from the `ScheduleBlock` class. These methods returned
`Parameter` objects that partially linked to the parameters in the
`ScheduleBlock` instance but assigning values using these objects did not
work correctly. Users should use `ScheduleBlock.parameters` instead and
iterate through `ScheduleBlock.references` and compare to the
`Schedule.parameters` attributes of the subreferences when needing to
distinguish which subroutine a parameter is used in. See
Qiskit#11654 for more information.
wshanks added a commit to wshanks/qiskit-terra that referenced this issue Jan 31, 2024
The `scoped_parameters` and `search_parameters` methods have been
removed from the `ScheduleBlock` class. These methods returned
`Parameter` objects that partially linked to the parameters in the
`ScheduleBlock` instance but assigning values using these objects did not
work correctly. Users should use `ScheduleBlock.parameters` instead and
iterate through `ScheduleBlock.references` and compare to the
`Schedule.parameters` attributes of the subreferences when needing to
distinguish which subroutine a parameter is used in. See
Qiskit#11654 for more information.
github-merge-queue bot pushed a commit that referenced this issue Feb 1, 2024
* Remove pulse scoped_parameters and search_parameters

The `scoped_parameters` and `search_parameters` methods have been
removed from the `ScheduleBlock` class. These methods returned
`Parameter` objects that partially linked to the parameters in the
`ScheduleBlock` instance but assigning values using these objects did not
work correctly. Users should use `ScheduleBlock.parameters` instead and
iterate through `ScheduleBlock.references` and compare to the
`Schedule.parameters` attributes of the subreferences when needing to
distinguish which subroutine a parameter is used in. See
#11654 for more information.

* Lint and sphinx clean up
github-merge-queue bot pushed a commit that referenced this issue Feb 1, 2024
* Deprecate pulse reference parameter scoping

As noted in #11654, parameter
scoping does not work properly beginning with Qiskit 0.45.0. The problem
is that the `Parameter` name is now used for the hash value and so the
parameter produced with the scoped name is not equivalent to the
original parameter and does not substitute for it when using parameter
assignment. Since the scoped parameter mechanism was a convenience
feature and not widely used, it is simply deprecated rather than made to
work with parameter assignment.

* black

* Update releasenotes/notes/deprecate-pulse-parameter-scoping-6b6b50a394b57937.yaml

Co-authored-by: Jake Lishman <jake@binhbar.com>

* Move warning to documentation

---------

Co-authored-by: Jake Lishman <jake@binhbar.com>
Co-authored-by: Jake Lishman <jake.lishman@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants