Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow-copy mutable containers in slot values #826

Merged
merged 14 commits into from
Sep 15, 2023
Merged

Conversation

jbednar
Copy link
Member

@jbednar jbednar commented Aug 30, 2023

Fixes #793.
Fixes #746.

Adds shallow copying of mutable slot values when instantiating Parameter objects or inheriting values for slots. Shallow copying is only done when per_instance is True (the default); if you don't want such copying you can share both Parameter instances and mutable slot values by setting per_instance=False.

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False)

class B(A):
    p = param.Selector(default=2)


b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
# [1, 2] [1, 2, 3] [1, 2, 3]

(previously A's objects list was also [1, 2, 3]; now fixed to be [1, 2])

There are various issues still, as listed below.

Tests

Needs a test capturing the example above, but I've got to run, so if anyone wants to add that I'd be grateful, or I can do that tomorrow.

Previous handling of watchers

PR #306 added this logic to Parameters.getitem() for shallow-copying watchers:

try:
    # Do not copy watchers on class parameter
    watchers = p.watchers
    p.watchers = {}
    p = copy.copy(p)
except:
    raise
finally:
    p.watchers = {k: list(v) for k, v in watchers.items()}

This PR generalizes such copying to apply to all the slot values besides default, but I was unable to determine why the existing code was so complex. First, why p.watchers = {k: list(v) for k, v in watchers.items()} instead of just p.watchers = copy.copy(watchers.items())? copy.copy is used just a couple of lines before, and when I use it here it seems to work just the same, as I'd expect, so I can't tell why it was a list comprehension that appears to achieve the same as copy.copy.

Second, what's with the try/except/finally? Deleting all that seems to work just fine, and I can't quite imagine a situation when the try block could fail but the watchers are still valid to reconstruct finally. Maybe something in Panel? For now I deleted it as it does not seem to achieve anything.

Third, I've kept the convoluted mechanism for zeroing-out the watchers in the original Parameter object when creating the new one. Simply shallow-copying the watchers after copying the Parameter object leaves an extra set of watchers that causes tests to fail, but I'm not at all sure why it's appropriate to move the watchers to the new Parameter object rather than where they were originally watching. For now I chose to be conservative, but I'm not confident why it was done this way, so if there's a reason, it should have a clear comment added.

What's mutable and when is it copied?

For now, mutable sequences, mappings, and sets are considered mutable, which covers the list and dict cases that I'm aware of us using in slots. Such values are shallow-copied for all slots except default, to ensure that any independent Parameter copies also have independent objects lists, etc. Parameterized objects are also mutable, but I don't know if they are ever used in slots. If they are, we should consider whether they should be shallow-copied, but my guess is that they should not, since it's largely containers that we want shallow-copied, not arbitrary instantiated objects. Maybe is_mutable should be renamed is_mutable_container to convey that?

The reason for not copying the default value is to allow a specific list or dict to be shared across Parameters, which has always been supported. E.g. one could have a few global lists like search_paths, debug_search_paths, etc., whose values are curated and maintained independently of which particular Parameters have been set to those values. We can revisit whether mutable containers should be copied for default too, but if so, it should be a separate PR with its own justification.

I currently check for default only when copying the Parameter object, not during inheritance. It might also be appropriate to skip shallow-copying for for inheritance as well, but I haven't found an example where that would come up. Probably worth doing?

_update_state

After merging this PR, I believe we should be able to remove _update_state as discussed in #807 and #817 . _update_state was needed because we had to wait for param inheritance before adding to the objects lists for Selectors, but now that values get shallow copied, it should be feasible to populate the objects lists (e.g. from the default value) immediately, during the constructor. Doing so should simplify the Selector logic, but I haven't yet tested that this PR's approach will achieve that goal.

My guess is that eliminating _update_state is required, for fixing this remaining bad behavior:

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False, per_instance=True)
    
class B(A):
    i = param.Integer(8)

b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
#[1, 2, 3] [1, 2, 3] [1, 2, 3]

Here the Parameter object isn't copied into B, and the new value 3 ends up in A's object list, which it should not. I think that may be from _update_state being called on the class's copy of the objects list, not the instance's, since it's called in the metaclass, but I haven't tracked that down.

@jbednar jbednar added this to the 2.0 milestone Aug 30, 2023
@jbednar jbednar requested review from philippjfr and maximlt August 30, 2023 04:53
param/parameterized.py Outdated Show resolved Hide resolved
param/parameterized.py Outdated Show resolved Hide resolved
tests/testparameterizedobject.py Outdated Show resolved Hide resolved
param/parameterized.py Outdated Show resolved Hide resolved
param/parameterized.py Outdated Show resolved Hide resolved
@maximlt
Copy link
Member

maximlt commented Aug 30, 2023

  • on Previous handling of watchers, I agree we need to better understand what was going on there, hopefully Philipp can remember some of that otherwise we'll have to try to dig more into it
  • on _update_state, I wish we could get rid of it but I also know that the Selector(s) Parameters are quite dynamic (some slot default being callable that depend on the state of other slots) and are difficult to setup correctly when the slots aren't populated in the expected order, which happens in an inheritance context.

What do you think about the examples below? In the first snippet the code accesses the Parameter before setting p.s to 3, to run __getitem__ and shallow-copy the objects. In the second case this is commented out, leading to the objects on the class Parameter being updated as the Parameter doesn't yet exist on the instance. This is similar to the last example you shared @jbednar, without any inheritance though.

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.param.s.objects  # runs `__getitem__`
p.s = 3
assert P.param.s.objects == [1, 2]
class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
# p.param.s.objects
p.s = 3
assert P.param.s.objects == [1, 2, 3]

@jbednar
Copy link
Member Author

jbednar commented Aug 30, 2023

@maximlt , that's a good point:

image

Maybe we need to populate the Parameter objects more proactively. Any suggestions for where to make that change?

@maximlt
Copy link
Member

maximlt commented Aug 30, 2023

Any suggestions for where to make that change?

Maybe at the start of Parameters._setup_params which is called from Parameterized.__init__?

@maximlt
Copy link
Member

maximlt commented Sep 1, 2023

Actually the behavior I reported in my previous comment is not tied to mutable containers.

image
Code
import param

class P(param.Parameterized):
    x = param.Number()

p = P()

P.param.x.bounds = (100, 200)

assert p.param.x.bounds == (100, 200)

class P(param.Parameterized):
    x = param.Number()

p = P()

p.param.x  # side-effects there

P.param.x.bounds = (100, 200)

assert p.param.x.bounds is None

Accessing a Parameter on the .param namespace creates a Parameter on the instance. I'm not sure it's really a bug, certainly it's confusing, but really the way to get a Parameter you can control at the class level should be with per_instance=False.

@philippjfr
Copy link
Member

My reading of the watchers code above is that it's not fully correct. The problem being that we can't distinguish between watchers that are meant to be on the class parameter and watchers that are meant to be on the instance parameter since there is no difference until the instance parameter is created, i.e.:

param.Parameterized.param.watch(print, 'name', what='allow_None')

looks exactly the same as:

p = param.Parameterized()
p.param.watch(print, 'name', what='allow_None')

In general it seems very rare to have watchers for attributes on class parameters so this hasn't come up, but it's definitely not correct. I think this is another issue we can't easily fix without making instance parameter creation non-lazy.

Second, what's with the try/except/finally? Deleting all that seems to work just fine, and I can't quite imagine a situation when the try block could fail but the watchers are still valid to reconstruct finally. Maybe something in Panel? For now I deleted it as it does not seem to achieve anything.

I do think it's important, at least if we decide not to fix the underlying issue yet. If there is somehow an error while creating the instance parameter (e.g. because some value is not safe to copy) then the watchers end up being lost entirely.

@jbednar
Copy link
Member Author

jbednar commented Sep 3, 2023

If there is somehow an error while creating the instance parameter (e.g. because some value is not safe to copy) then the watchers end up being lost entirely.

How would it help to have watchers copied correctly but not whatever slot value was about to be copied after the failing one? This try/except/finally seems like it would just cover up problems, not avoid them -- why would we want watchers copied correctly when other slots aren't? If we wanted to copy all the slots that we could, and put a warning for any that didn't work, wouldn't we do that in a loop where the try/except was per slot, not for the entire copy operation?

@philippjfr
Copy link
Member

It's not about the object being copied to but about restoring the class that's being copied from.

@jbednar
Copy link
Member Author

jbednar commented Sep 3, 2023

Hmm; that's weird. My understanding of that code is that if copy.copy fails, an exception will be raised, with p remaining the class Parameter object, except that p.watchers has been replaced with a copy of the original dict. Seems like a lot of hoops to be jumping back and forth through to achieve that outcome, i.e. emptying out the watchers and then replacing it with something just like the original.

And then if copy.copy succeeds, no exception will be raised, and p will become the instance's copy of the Parameter, where the instance p.watchers will have a copy of the original dict. But won't the original class Parameter now have an empty watchers dict? How is this working at all when there are multiple instances that need to have the same Parameter instantiated into it? Only the first one gets the watchers?

Unless I'm very confused, it seems like we have two defensible options here: (1) Leave watchers on class Parameter objects, and don't copy them to instances. Instances would need their own watchers, which would be specific to that instance. Watchers on class Parameter objects would be watching the class Parameter object, not instances. (2) Leave watchers on Class Parameter objects, and copy them to all instances. Thus anything that watches the class Parameter will also be watching all the instances.

I am not seeing any argument for what it seems like the current code is doing, i.e. emptying out the watchers from the class Parameter object after instantiation. When is that ever appropriate?

@jbednar
Copy link
Member Author

jbednar commented Sep 8, 2023

@maximlt , I've pushed a few more commits to address the issues above:

  • In 9ba4aa1 I've restored the watchers logic to its original state using try/except/finally; if it's to be addressed, that should be done in a separate PR. It should now be unchanged from before this PR, just refactored differently.
  • In 7fcf8d4 I refactored the new Parameter.__getitem__ implementation to pull out the logic for copying Parameter objects per_instance into a new function _instantiated_parameter(). This commit should not change anything about the behavior; it just makes that logic reusable elsewhere.
  • In 6ce1ef3 I make Parameter.__set__ use _instantiated_parameter() to ensure that the Parameter object has been copied per_instance before any setting is done, so that e.g. _ensure_value_is_in_objects will affect the per_instance Parameter object instead of the class's Parameter object. In this commit I had to update one warning message because it's being generated another level deep in the stack trace, plus I deleted one newly-added line from the tests that was essentially enforcing that the objects on a Selector were stored only in the class, which this commit deliberately changes.

6ce1ef3 is the key one for addressing the problems above. After that commit, I believe these behaviors are now correct:

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False, per_instance=True)

class B(A):
    i = param.Integer(8)

b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
# [1, 2] [1, 2] [1, 2, 3]

(b now gets its own per_instance copy of the objects list)

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.param.s.objects
p.s = 3
print(P.param.s.objects, p.param.s.objects)
#[1, 2] [1, 2, 3]
class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.s = 3
print(P.param.s.objects, p.param.s.objects)
#[1, 2] [1, 2, 3]

(invoking __getitem__ is no longer required for p to get its own per_instance copy of the s Parameter object)

I'm ok with keeping the behavior you show in 3. above, i.e. that accessing a Parameter on the .param namespace creates a Parameter on the instance. I agree that if you want to control a Parameter at the class level, you can either change the class values before creating any instances, or you can set per_instance=False. If you change the Parameter at the class level after having instantiated some classes, there's no guarantee that the class-level changes will propagate down. As you show, sometimes they do, but really if you want such changes you have to make them on the instance or you have to set per_instance to False!

So far, I haven't messed with _update_state, because changing that wasn't necessary for anything to do with instantiating slots.

So this PR should be ready to review and merge now; it now does what it says: adds shallow-copying for Parameter slot values that happen to be mutable containers, whenever a Parameter object is copied to an instance or inherits its slot values. Parameter objects are still copied only lazily, as before, but now there is one more case where it gets copied, namely when a parameter value is set on the instance (to handle _ensure_value_is_in_objects).

@jbednar jbednar changed the title Shallow-copy slot values Shallow-copy mutable containers in slot values Sep 8, 2023
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question in my review. In addition to that, I would like to get your opinion @jbednar on the following examples?

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2])

class B(A):
    p = param.Selector(default=2)

B.param.p.objects.append(3)
print(A.param.p.objects)  # [1, 2]
class A(param.Parameterized):
    p = param.Selector(objects=[1, 2])

class B(A):
    pass

B.param.p.objects.append(3)
print(A.param.p.objects)  # [1, 2, 3]

param/parameterized.py Show resolved Hide resolved
param/parameterized.py Show resolved Hide resolved
@jbednar
Copy link
Member Author

jbednar commented Sep 13, 2023

I assume everyone is happy with the behavior in case 1 (two separate Parameter objects, and two separate objects lists)?

Case 2 is trickier. On the one hand, the current behavior in 2 seems like the only way to define a Parameter object whose settings can easily be inherited across a hierarchy. I.e., not copying the Parameter object between A and B is what lets people mess with the one in A and set things up as they like, while knowing that it will all propagate nicely. Plus, if they do want separate Parameter objects in subclasses, it's easy enough to just put them there, as in 1.

On the other, I do agree that there's some argument for duplicating the Parameter automatically to make case 2 act like case 1, so that they behave independently. On balance I think it's ok that we aren't doing that; seems like we lose more than we gain.

@philippjfr
Copy link
Member

I'll be addressing watcher related issues in a separate PR. I also think that on balance the behavior in the example above is defensible. So I'm going to consider this ready and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Selector regression in an inheritance context? Selector objects shared across instances and class
3 participants