Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builds trouble with OmegaConf.to_object #242

Closed
Jasha10 opened this issue Mar 6, 2022 · 7 comments
Closed

builds trouble with OmegaConf.to_object #242

Jasha10 opened this issue Mar 6, 2022 · 7 comments
Labels
hydra/omegaconf Tied to behavior of hydra/omegaconf

Comments

@Jasha10
Copy link
Contributor

Jasha10 commented Mar 6, 2022

I'm having trouble when calling OmegaConf.to_object on DictConfig objects backed by builds-generated dataclasses. Here is an example showing the difference in behavior between a hand-coded dataclass and one generated using builds:

# repro.py
from dataclasses import dataclass

from hydra_zen import builds
from omegaconf import OmegaConf


def foo(x: int) -> int:
    return x


@dataclass
class FooConfHandCoded:
    _target_: str = "__main__.foo"
    x: int = 10


FooConfBuilds = builds(foo, x=10)

foo_conf_hand_coded = OmegaConf.structured(FooConfHandCoded)
foo_conf_builds = OmegaConf.structured(FooConfBuilds)

print("\nUsing hand-coded dataclass:")
print(OmegaConf.to_object(foo_conf_hand_coded))

print("\nUsing builds:")
print(OmegaConf.to_object(foo_conf_builds))
$ python repro.py

Using hand-coded dataclass:
FooConfHandCoded(_target_='__main__.foo', x=10)

Using builds:
Traceback (most recent call last):
  File "/home/jasha10/tmp/repro.py", line 27, in <module>
    print(OmegaConf.to_object(foo_conf_builds))
  File "/home/jasha10/miniconda3/envs/pysc/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 574, in to_object
    return OmegaConf.to_container(
  File "/home/jasha10/miniconda3/envs/pysc/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 553, in to_container
    return BaseContainer._to_content(
  File "/home/jasha10/miniconda3/envs/pysc/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 249, in _to_content
    return conf._to_object()
  File "/home/jasha10/miniconda3/envs/pysc/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 752, in _to_object
    result = object_type(**field_items)
TypeError: __init__() got an unexpected keyword argument '_target_'

This fundamentally comes down to a difference in type signatures between FooConfHandCoded.__init__ and FooConfBuilds.__init__:

>>> import inspect
>>> inspect.signature(FooConfHandCoded)
<Signature (_target_: str = '__main__.foo', x: int = 10) -> None>
>>> inspect.signature(FooConfBuilds)
<Signature (x: int = 10) -> None>
@rsokl rsokl added the hydra/omegaconf Tied to behavior of hydra/omegaconf label Mar 6, 2022
@rsokl
Copy link
Contributor

rsokl commented Mar 6, 2022

TL;DR

  • This is an issue with how OmegaConf.to_object instantiates structured configs. The issue arises for a config backed by any structured config that possesses a field that is excluded from its signature (for dataclasses, via dataclasses.field(init=False , ...)).
  • builds and make_config excludes all omegaconf/Hydra/zen-specific fields from the structured config's signature. I could add an option to toggle this behavior (e.g. builds(..., zen_init=True)). However, this particular issue is perhaps better addressed in omegaconf...
  • There is a straightforward fix that will let OmegaConf.to_object work with structured configs that have non-init fields. Such structured configs already work with instantiate (i.e. non-init fields can already be overridden without issue). I am happy to open a PR for this on omegaconf's end.

Actual Response

Thanks so much for looking into this @Jasha10 !

This is due to the fact that builds sets all hydra/omegaconf/zen-specific dataclass fields with init=False to exclude those options from the signature.

OmegaConf.to_object(...) will throw an error on any conf backed by a structured config that has fields which are excluded from its signature:

# repro.py
from dataclasses import dataclass, field

from omegaconf import OmegaConf

@dataclass
class InitTrue:
    x: int = field(default=1, init=True)

@dataclass
class InitFalse:
    x: int = field(default=1, init=False)
    
Conf = OmegaConf.structured(InitTrue)

print("\nAll dataclass fields included in sig:")
print(OmegaConf.to_object(Conf))

Conf = OmegaConf.structured(InitFalse)

print("\nSome dataclass fields excluded from sig:")
print(OmegaConf.to_object(Conf))
$ python repro.py 

All dataclass fields included in sig:
InitTrue(x=1)

Some dataclass fields excluded from sig:
Traceback (most recent call last):
  File "repro.py", line 22, in <module>
    print(OmegaConf.to_object(Conf))
  File "C:\Users\Ryan Soklaski\Anaconda3\envs\hydra-zen\lib\site-packages\omegaconf\omegaconf.py", line 574, in to_object
    return OmegaConf.to_container(
  File "C:\Users\Ryan Soklaski\Anaconda3\envs\hydra-zen\lib\site-packages\omegaconf\omegaconf.py", line 553, in to_container
    return BaseContainer._to_content(
  File "C:\Users\Ryan Soklaski\Anaconda3\envs\hydra-zen\lib\site-packages\omegaconf\basecontainer.py", line 249, in _to_content
    return conf._to_object()
  File "C:\Users\Ryan Soklaski\Anaconda3\envs\hydra-zen\lib\site-packages\omegaconf\dictconfig.py", line 752, in _to_object
    result = object_type(**field_items)
TypeError: __init__() got an unexpected keyword argument 'x'

Solutions

There are two solutions that I can think of here, which are not exclusive from one another.

Changing behavior / exposing new behavior in builds:

builds could expose a new argument: zen_init: bool = False, which could be toggles to include all fields in the dataclass' signature:

def f(x: int): ...

# Signature:
# InitFalse(x: int = 1)
InitFalse = builds(f, x = 1, zen_partial=True, zen_meta=dict(a=1), zen_init=False)  # default behavior

# Signature:
# InitTrue(
#     _target_: str = 'hydra_zen.funcs.zen_processing',
#     _zen_target: str = '__main__.f',
#     _zen_partial: bool = True,
#     _zen_exclude: Tuple[str, ...] = ('a',),
#     x: int = 1,
#     a: Any = 1,
# )
InitTrue = builds(f, x = 1, zen_partial=True, zen_meta=dict(a=1), zen_init=True)  

As you can see, it is generally much easier for a user to reason about the signature of the resulting config when it solely reflects elements of the target's signature. Otherwise, users need to know things about the implementation details of structured/targeted configs. This is why our current behavior is the way it is.

There are other reasons why set(signature(builds(<target>))) <= set(signature(<target>)) is a very nice property to have, but this post is already too long as it stands.

I would be happy to support zen_init=True if it is a generally useful feature. It would be a substantial amount of work, but it is doable. That being said, as far as this particular issue is concerned, I think a solution in omegaconf would be better...

Accommodating init=Falsefields in omegaconf

I believe that OmegaConf.to_object can accommodate structured config fields for which init=False quite easily (for both dataclasses and attr classes). I would be happy to open a PR that adds support for this.

I should point out that non-init fields can already be overridden during instantiation as you'd expect:

@dataclass
class Conf:
    _target_: str = field(init=False, default="builtins.int")
>>> instantiate(Conf)
0
>>> instantiate(Conf())
0

>>> instantiate(Conf, _target_="builtins.dict")
{}
>>> instantiate(Conf(), _target_="builtins.dict")
{}

and they seem to otherwise behave as-expected in omegaconf/Hydra. This issue that you found is the first/only problem I have seen with non-init fields. Thus I think that supporting non-init fields in OmegaConf.to_object would be a nice (and simple) enhancement!

What a solution would look like

In to_object, field_items is used to call object_type(**field_items), thus it should consist of fields that only exist in the structured config's signature -- [field.name for field in dataclasses.fields(obj) if field.init] for the case of dataclasses -- and all other fields should be gathered by nonfield_items. (perhaps they should actually be named init_field_items and non_init_field_items, respectively).

For dataclasses, field_items is currently populated by omegaconf._utils.get_dataclass_field_names, which accesses the names of all fields via: [field.name for field in dataclasses.fields(obj)].

Let me know what you think, and thanks again for taking the time to look into this and report this issue!

@Jasha10
Copy link
Contributor Author

Jasha10 commented Mar 7, 2022

Got it! This idea of distinguishing between init_field_items and non_init_field_items makes sense. This question of how to handle init=False has come up once before, in omegaconf#789.

My only reservation here is that fields with init=False are often initialized via the dataclass' __post_init__ method. It is not clear to me how to handle the case where there is disagreement between the value of a given init=False field on a structured DictConfig instance and the value that will be set by __post_init__ when to_object gets called.
The current behavior of to_object is to call setattr on the dataclass instance, once for each item in nonfield_items. Here is an example showing how extension of this behavior to init=False fields might lead to violation of an invariant enforced by __post_init__.

@rsokl
Copy link
Contributor

rsokl commented Mar 7, 2022

Thanks for pointing me to omegaconf#789. I'm happy to move this conversation to that thread if it would be useful.

(For the record: I think that the OP's recommendation – that OmegaConf should only care about fields that appear in the dataclass' signature – is backwards. As I advocate for above: OmegaConf should only care about the fields present on the object, regardless of its signature.)

Handling init=False when __post_init__ is not present

The current behavior of to_object is to call setattr on the dataclass instance, once for each item in nonfield_items

Fortunately, this would work without any issue/ambiguity in the case where there are init=False fields and no __post_init__ method is present:

@dataclass
class NoPostInit:
    a: float
    b: float = field(init=False, default=2)

@dataclass
class YesPostInit:
    a: float
    b: float = field(init=False, default=2)
        
    def __post_init__(self):
        self.b = self.b + self.a
>>> "__post_init__" in NoPostInit.__dict__
False

>>> "__post_init__" in YesPostInit.__dict__
True

Thus, an incremental enhancement to to_object could be:

  • Support NoPostInit by using the init_field_items and non_init_field_items that we discussed above.
  • Explicitly raise a descriptive error message for YesPostInit and document this behavior.

Given that to_object currently raises on both of these dataclasses, this would not break any backwards compatibility. And, can you believe it, this just happens to cover the builds case. How convenient 😅 !

Handling __post_init__

It is not clear to me how to handle the case where there is disagreement between the value of a given init=False field on a structured DictConfig instance and the value that will be set by __post_init__ when to_object gets called.

This is definitely tricky.. I was thinking that you can look for an field(init=False) where no default is included. In this case, the field is very likely set via __post_init__... but that doesn't really clear things up that much.

I personally think that people should think about to_object as a direct translation of a config back to a dataclass-instance, with no expectation that __post_init__ will be run. E.g. Whatever fields cfg has should reflect the fields that obj ends up with, as you demonstrate here. Users relying on __post_init__ to do post-processing should be responsible for re-initializing the dataclass to ensure that they get their desired post-init behavior.

Lastly, to_object could be given the option: to_object(cfg, finalize_post_init: bool). This would let users opt-into the behavior they want. The downside is that it can still be pretty hard for people to reason about.

@rsokl
Copy link
Contributor

rsokl commented Mar 7, 2022

Actually... I think __post_init__ poses more insidious problems than I original thought, regardless of init=False fields.

Consider the following behavior that currently occurs in OmegaConf:

from dataclasses import dataclass, field

@dataclass
class A:
    a: float
    b: float
    def __post_init__(self) -> None:
        self.a = self.a ** 2
        self.b = self.b ** 2
>>> cfg = OmegaConf.create(A(a=2, b=4))

>>> cfg
{'a': 4.0, 'b': 16.0}

>>> OmegaConf.to_object(cfg)
A(a=16.0, b=256.0)

The fact that this doesn't roundtrip makes me think think that, in all cases, __post_init__ should be circumvented by to_object. That is, in the case that __post_init__ is present, the dataclass should be initialized and then all of its fields – including those present in init – should be set via setattr!

@rsokl
Copy link
Contributor

rsokl commented Mar 7, 2022

So, basically I think that to_object should do the following in the case where __post_init__ is present:

# to_object({"a": 2, "b": 4.0}) would do:
obj = A(a=2.0, b=4.0)
obj.a = 2.0
obj.b = 4.0

If a user wants to run __post_init__, they can just do it!

>>> obj
A(a=2.0, b=4.0)

>>> obj.__post_init__()  # or perhaps this can be called via `to_object(cfg, post_init=True)` 
A(a=4.0, b=16.0)

@rsokl
Copy link
Contributor

rsokl commented Mar 7, 2022

@Jasha10 if it sounds good to you, I'll close this and move the discussion to omegaconf#789, where I'll summarize the relevant parts of this discussion (but much more concisely).

@Jasha10
Copy link
Contributor Author

Jasha10 commented Mar 7, 2022

That would be great. Thanks!

@rsokl rsokl closed this as completed Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hydra/omegaconf Tied to behavior of hydra/omegaconf
Projects
None yet
Development

No branches or pull requests

2 participants