Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured configs do not respect dataclasses.field(init=False) #789

Closed
tmke8 opened this issue Sep 25, 2021 · 17 comments · Fixed by #879
Closed

Structured configs do not respect dataclasses.field(init=False) #789

tmke8 opened this issue Sep 25, 2021 · 17 comments · Fixed by #879
Labels
enhancement New feature or request structured config

Comments

@tmke8
Copy link
Contributor

tmke8 commented Sep 25, 2021

The documentation of dataclasses has this example for init=False:

@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

    def __post_init__(self):
        self.c = self.a + self.b

It just means that c is not an argument to the __init__. I find this useful sometimes when I have a configuration value that can be computed from other configuration values.

Describe the solution you'd like

OmegaConf should just ignore fields that have init=False set.

Describe alternatives you've considered

One alternative is to not give a type annotation to the field and just set it to MISSING:

>>> @dataclass
... class C:
...     a: float
...     b: float
...     c = MISSING
...     def __post_init__(self):
...         self.c = self.a + self.b
...
>>> d = OmegaConf.structured(C)
>>> d.a = 3
>>> d.b = 4
>>> o = OmegaConf.to_object(d)
>>> o.c
7.0

This works, but it is a bit sad that we have to accept the loss of the type annotation.

Additional context

This is what currently happens when you use init=False.
>>> from dataclasses import dataclass, field
>>> @dataclass
... class C:
...     a: float
...     b: float
...     c: float = field(init=False)
...     def __post_init__(self):
...         self.c = self.a + self.b
...
>>> c = C(a=3, b=4)
>>> c.c
7
>>> from omegaconf import OmegaConf
>>> d = OmegaConf.structured(C)
>>> d.a = 3
>>> d.b = 4
>>> OmegaConf.to_object(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 574, in to_object
    return OmegaConf.to_container(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 553, in to_container
    return BaseContainer._to_content(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 249, in _to_content
    return conf._to_object()
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 735, in _to_object
    self._format_and_raise(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/base.py", line 190, in _format_and_raise
    format_and_raise(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.MissingMandatoryValue: Structured config of type `C` has missing mandatory value: c
    full_key: c
    object_type=C
>>> d.c = 0
>>> OmegaConf.to_object(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 574, in to_object
    return OmegaConf.to_container(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 553, in to_container
    return BaseContainer._to_content(
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 249, in _to_content
    return conf._to_object()
  File "/home/tmk/.conda/envs/palbolts/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 752, in _to_object
    result = object_type(**field_items)
TypeError: __init__() got an unexpected keyword argument 'c'
@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 27, 2021

Hi @thomkeh , thanks for the feature request!

OmegaConf should just ignore fields that have init=False set.

What is the motivation for ignoring fields that have init=False? Is this so that the dataclass's __post_init__ method will be able to initialize that field later (when it's time to initialize a dataclass instance using the DictConfig's data)?

EDIT: Here is a WORKAROUND:

@dataclass
class C:
    a: float
    b: float

    def __post_init__(self):
        self.c: float = self.a + self.b

obj = C(1, 2)
assert obj.c == 3
reveal_type(obj.c)
# "c" is not a field of C, but type checkers still reveal obj.c to be of type `float`

The obj.c attribute behaves "like an init=False field".

>>> from omegaconf import OmegaConf
>>> d = OmegaConf.create(C)
>>> d.a = 3
>>> d.b = 4
>>> o = OmegaConf.to_object(d)
>>> o.c
7.0

@Jasha10 Jasha10 added the enhancement New feature or request label Sep 27, 2021
@omry
Copy link
Owner

omry commented Sep 27, 2021

Structured configs do not respect dataclasses.field(init=False)

It does exactly what I am expecting it to do. c is missing, just like a and b.

In [8]: @dataclass
   ...: class C:
   ...:     a: float
   ...:     b: float
   ...:     c: float = field(init=False)
   ...:

In [9]: OmegaConf.structured(C)
Out[9]: {'a': '???', 'b': '???', 'c': '???'}

This issue looks like it's about to_object, not about a the handling of OmegaConf of fields with init=False.

@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 27, 2021

This issue looks like it's about to_object, not about a the handling of OmegaConf of fields with init=False.

Good point.
Perhaps to_object should ignore fields with init=False.

@omry
Copy link
Owner

omry commented Sep 27, 2021

In general init=False does not make a whole lot of sense for OmegaConf configs because users are not typically instantiating them.

I think to_object() should not pass such fields when instantiating, but instead assign them letter.

@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 27, 2021

I think to_object() should not pass such fields when instantiating

Makes sense to me.

but instead assign them letter.

Not sure what you mean here...

@omry
Copy link
Owner

omry commented Sep 27, 2021

If you have

from dataclasses import dataclass, field
from omegaconf import OmegaConf
@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

cfg = OmegaConf.create(C(a=1, b=2))
cfg.c = 3
obj = OmegaConf.to_object(cfg)

I am expecting the resulting object to have the values a=1, b=2 and c=3.

@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 28, 2021

I see.

So if the __post_init__ function does assign to the c field, to_object would overwrite that assignment after the obj instance is constructed:

from dataclasses import dataclass, field
from omegaconf import OmegaConf
@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)
    def __post_init__(self) -> None:
        self.c = self.a + self.b + 100

cfg = OmegaConf.create(C(a=1, b=2))  # C(a=1, b=2) has c==103
cfg.c = 3
obj = OmegaConf.to_object(cfg)  # obj has c==3, not c==103

The exception would be if cfg.c is MISSING. In that case, I think to_object should not overwrite the value c==103 that is set by __post_init__.

@omry
Copy link
Owner

omry commented Sep 28, 2021

yeah, this is a good point. not sure how to handle it.
In general, OmegaConf is not attempting to support the full spectrum of what dataclasses can do. In particular to_object is a later addition as you know.

@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 28, 2021

For the record, here is the behavior if we follow the OP's suggestion to exclude init=False fields from the structured config:

from dataclasses import dataclass, field
from omegaconf import OmegaConf
@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)
    def __post_init__(self):
        self.c = self.a + self.b
        
# creating from a dataclass
cfg = OmegaConf.create(C)
assert cfg == {"a"="???", "b"="???"}
cfg.a = 1
cfg.b = 2
obj = OmegaConf.to_object(cfg)
assert obj.c == 3

# creating from an instance
cfg_inst = OmegaConf.create(C(a=1, b=2))
assert cfg_inst == {"a"=1, "b"=2}
cfg_inst.a = 10
cfg_inst.b = 20
obj2 = OmegaConf.to_object(cfg_inst)
assert obj2.c == 30  # this is expected

A drawback of this approach is that duck-typing does not work (i.e. treating cfg like an instance of C does not work).
An advantage is that we do not violate __post_init__'s guarantee that a + b == c.

@shapovalov
Copy link

Hi, I am also registering the interest in excluding init=False.
We are creating a framework within PyTorch3D where we try to unify the definitions of Pytorch modules and structured configs. Thus, the same class is passed to structured() and then instantiated in the code. It is useful to annotate types for dynamic fields, and init=False is generally fulfils that purpose.

@rsokl
Copy link

rsokl commented Mar 8, 2022

TL;DR:

  • Supporting init=False fields for dataclasses where __post_init__ is absent looks like it would be a straightforward enhancement to to_object.
    • It would not break any backwards compatibility.
    • I'd be happy to open a PR.
    • This would be nice for hydra-zen users!
  • The problems with __post_init__ discussed earlier in this thread actually stand regardless of init=False fields being present or not. I'll demonstrate this and will share my two cents about it.

Supporting init=False fields

hydra-zen's builds makes heavy use of fields where init=False to exclude OmegaConf-specific fields from its signature:

>>> from inspect import signature
>>> from hydra_zen import builds

>>> def f(x: int, y: str = "hi"): ...

>>> Conf = builds(f, populate_full_signature=True)  # creates a dataclass whose signature matches that of `f`

>>> signature(Conf)
<Signature (x: int, y: str = 'hi') -> None>

>>> Conf._target_  # _target_ is an `init=False` field
'__main__.f'

It should be noted that the dataclass types created by builds do not make any use of __post_init__.

Looking at the implementation of to_object, I believe it would be relatively straightforward to support init=False fields in the specific case where __post_init__ is not defined on the dataclass object. The current implementation would simply adjust object_type_field_names to only accumulate those fields that are present in the structured config's signature.

This would be a nice enhancement – to_object would work with all dataclasses created by builds! – that would not break backwards compatibility in OmegaConf.

Regarding __post_init__

Supporting __post_init__ via to_object is hairy, regardless of whether or not you have init=False fields. Here is a pathological example that already breaks roundtrips in OmegaConf:

@dataclass
class BadGuy:
    a: float

    def __post_init__(self) -> None:
        self.a = self.a ** 2
>>> orig_obj = BadGuy(a=4)
>>> orig_obj.a
16.0

>>> cfg = OmegaConf.create(orig_obj)
>>> cfg
{'a': 16.0}

>>> new_obj = OmegaConf.to_object(cfg)
>>> new_obj.a   # doesn't match `orig_obj.a`
256.0

My two cents is that the attributes of cfg (a DictConfig) should be matched by the attributes of to_object(cfg) in both name and value; e.g. cfg.<attr> and to_object(cfg).<attr> should always return the same value. After all, OmegConf's various representations of configs are attribute-based, not signature based.

Thus, I am suggesting that to_object should actively circumvent __post_init__. For example, in the case of BadGuy above, to_object would effectively perform:

# cfg.a is 16.0

obj = BadGuy(a=cfg.a)  # obj.a is 256.0 due to __post_init__
obj.a = cfg.a  # override obj.a with expected value

return obj  # obj.a is 16.0

Users could then manually call obj.__post_init__(), or to_object could potentially add a post_init: bool = False argument so that users can opt-in to this behavior.

As mentioned earlier, this __post_init__ problem is really independent from supporting init=False fields.

@tmke8
Copy link
Contributor Author

tmke8 commented Mar 8, 2022

FWIW, since opening this issue, I have realized that my use case was better handled by @property or even @cached_property (added in python 3.8). So, consider my suggestion retracted.

(And I don't understand @rsokl 's requirements well enough to comment on his proposal.)

@Jasha10
Copy link
Collaborator

Jasha10 commented Mar 9, 2022

@thomkeh I think using @property is an excellent alternative to using __post_init__+init=False, and it avoids the issues brought up by @rsokl.

By the way @thomkeh, I've discovered a workaround for your original issue:

@dataclass
class C:
    a: float
    b: float

    def __post_init__(self):
        self.c: float = self.a + self.b

obj = C(1, 2)
assert obj.c == 3
reveal_type(obj.c)
# "c" is not a field of C, but type checkers still reveal obj.c to be of type `float`

The obj.c attribute behaves "like an init=False field".


@rsokl:

Looking at the implementation of to_object, I believe it would be relatively straightforward to support init=False fields in the specific case where __post_init__ is not defined on the dataclass object. The current implementation would simply adjust object_type_field_names to only accumulate those fields that are present in the structured config's signature.

Your suggestion to call MyDataClass.__init__ with init_field_items, followed by calling setattr for each of the non_init_field_items, feels very natural, and I would like to move forward with it.

I feel that a provision should be made for the case where an init=False field has the special value omegaconf.MISSING, in which case it makes sense to skip calling setattr. Here is the motivating example:

@dataclass
class MyDataClass:
    foo: int = field(init=False)

obj1 = MyDataClass()
assert not hasattr(obj1, "foo")

cfg = OmegaConf.create(MyDataClass)
assert OmegaConf.is_missing(cfg, "foo")
obj2 = OmegaConf.to_object(cfg)
# It would be strange if in this case `obj2.foo == MISSING`.
# I think `hasattr(obj2, "foo")` should be False here,
# agreeing with the case of `obj1` above.

If you are inclinded to submit a PR implementing this, it would be most welcome!
Otherwise, I will put it on my docket.


The __post_init__ problem

As mentioned earlier, this __post_init__ problem is really independent from supporting init=False fields.

Agreed.

You've pointed out that strange things can occur if __post_init__ gets run twice, specifically if __post_init__ changes the value of init=True fields (e.g. as in your BadGuy example above). I think using __post_init__ to change the value of an init=True field is a strange thing to do, and I'm not sure what use-case would require such logic.

Let me point out that, to ensure __post_init__ is run only once, one can call OmegaConf.create on the dataclass itself (rather than on a dataclass instance):

cfg = OmegaConf.create(BadGuy)
assert cfg.a is MISSING
cfg.a = 4.
obj = OmegaConf.to_object(cfg)
assert obj.a == 16.

This is the pattern that should be used e.g. if __post_init__ has side-effects and must be invoked exactly once.

I understand your view that attributes of structured DictConfig instances should correspond closely to attributes of the dataclass instances returned from to_object. That being said, I fear that undoing the effects of a call to __post_init__ by invoking setattr could lead to confusion. In the absence of a motivating use-case, I think for now we should preserve the current behavior of OmegaConf.to_object in this case where __post_init__ is defined and where all fields have init=True.

The case where __post_init__ is defined and some fields have init=False:

I think we agree that, in this case, OmegaConf.to_object should raise an error.

@rsokl
Copy link

rsokl commented Mar 10, 2022

Your suggestion to call MyDataClass.init with init_field_items, followed by calling setattr for each of the non_init_field_items, feels very natural, and I would like to move forward with it.

I feel that a provision should be made for the case where an init=False field has the special value omegaconf.MISSING, in which case it makes sense to skip calling setattr.

Great, I will take a swing at this; hopefully relatively soon 😄

@Jasha10
Copy link
Collaborator

Jasha10 commented Mar 10, 2022

Great! FYR most of the to_object tests are here.

@Jasha10
Copy link
Collaborator

Jasha10 commented Mar 24, 2022

I ended up knocking this out myself in #879.

@rsokl
Copy link

rsokl commented Mar 26, 2022

Ah, thanks for tackling this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request structured config
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants