Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgroups destination cannot be a substring of other destinations #191

Closed
Tomiinek opened this issue Jan 10, 2023 · 15 comments · Fixed by #194
Closed

Subgroups destination cannot be a substring of other destinations #191

Tomiinek opened this issue Jan 10, 2023 · 15 comments · Fixed by #194

Comments

@Tomiinek
Copy link

Tomiinek commented Jan 10, 2023

Describe the bug
I would like to have two subgroups with a different destinations, let's say config and config2. However, if one of them is a substring of another, an exception is raised saying:

File ~/meaning3/env/lib/python3.10/site-packages/simple_parsing/parsing.py:713, in ArgumentParser._set_instances_in_namespace(self, parsed_args)
    710                 setattr_recursive(existing, path_from_parent_to_child, new_config)
    712         else:
--> 713             raise RuntimeError(
    714                 "Something went wrong while trying to update the value of a \n"
    715                 "subgroup in the args! Please report this bug by opening an issue!"
    716             )
RuntimeError: Something went wrong while trying to update the value of a 
subgroup in the args! Please report this bug by opening an issue!

To Reproduce

from __future__ import annotations
from dataclasses import dataclass
from simple_parsing import ArgumentParser, choice, subgroups
from pathlib import Path


@dataclass
class ModelConfig:
    ...


@dataclass
class ModelAConfig(ModelConfig):
    lr: float = 3e-4
    optimizer: str = "Adam"
    betas: tuple[float, float] = 0.9, 0.999


@dataclass
class ModelBConfig(ModelConfig):
    lr: float = 1e-3
    optimizer: str = "SGD"
    momentum: float = 1.234

@dataclass
class Config:

    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default=ModelAConfig(),
    )

parser = ArgumentParser()
parser.add_arguments(Config, dest="config")
parser.add_arguments(Config, dest="config2") # this produces and exception
# parser.add_arguments(Config, dest="something") # this works as expected 
args = parser.parse_args("")

config: Config = args.config

print(config)

Expected behavior
Not to rise the exception 🙂

I am not sure, but maybe this condition

if wrapper.dest not in subgroup_dest:
# current dataclass wrapper doesn't have anything to do with this
# subgroup. Go to the next subgroup.
continue

should be changed to:

if not subgroup_dest.startswith(wrapper.dest + "."):
    continue
@zhiruiluo
Copy link
Contributor

zhiruiluo commented Jan 11, 2023

Hey @Tomiinek, I've checked the master code and the subgroups has been refactored in #185. It seems this issue had been addressed. Your example passed in the current master.

@Tomiinek
Copy link
Author

Tomiinek commented Jan 11, 2023

Hello, thank you for your prompt reply.

I am confused 😕 Does it really work for you?

It actually does not work for me from the current master, the example above gives me:

File ".../simple_parsing/helpers/fields.py", line 416, in subgroups
    if default is not MISSING and default not in subgroups:
TypeError: unhashable type: 'ModelAConfig'

If I make the dataclasses hashable:

@dataclass(unsafe_hash=True)
class ModelAConfig(ModelConfig):
    lr: float = 3e-4
    optimizer: str = "Adam"
    betas: tuple[float, float] = 0.9, 0.999


@dataclass(unsafe_hash=True)
class ModelBConfig(ModelConfig):
    lr: float = 1e-3
    optimizer: str = "SGD"
    momentum: float = 1.234

I get

File ".../simple_parsing/helpers/fields.py", line 417, in subgroups
    raise ValueError("default must be a key in the subgroups dict!")
ValueError: default must be a key in the subgroups dict!

so I have to change the default to a string key:

@dataclass
class Config:

    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default="model_a",
    )

But I lost the ability to set some parameters in the default, for example ModelAConfig(lr=0.1) (which was previously possible, I do not want to set it directly in ModelAConfig because it is used on multiple places). Is this your intention? If so, what is the recommended way to set the defaults? Some joggling with parser.set_defaults?

@zhiruiluo
Copy link
Contributor

zhiruiluo commented Jan 11, 2023

Please refere to #186 and field [Python Docs] for how to initialize a field. The subgroups returns a customized Field for you with some additional checkings and specialized metadata.

You can use the default_factory as indicated by field [Python Docs]: "If provided, it must be a zero-argument callable that will be called when a default value is needed for this field. "

Here are some examples to achieve your goal:

@dataclass
class Config:
    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default_factory=ModelAConfig,
    )

@dataclass
class Config:
    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default_factory=lambda : ModelAConfig(lr=0.1),
    )

@Tomiinek
Copy link
Author

Tomiinek commented Jan 11, 2023

Very nice 🙂

Is it supposed to work out of the box? Asking because there seems to be this check (so the factory itself has to be a value in the dict so I do not see the way to set different arguments here):

if default_factory is not MISSING and default_factory not in subgroups.values():
# TODO: This might a little bit too strict. We don't want to encourage people creating lots
# of classes just to change the default arguments.
raise ValueError("default_factory must be a value in the subgroups dict!")

So

@dataclass
class Config:
    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default_factory=lambda : ModelAConfig(lr=0.1),
    )

actually results in an ValueError.

Maybe something like this could work?

if default_factory is not MISSING and type(default_factory()) not in subgroups.values(): 
   ...

@zhiruiluo
Copy link
Contributor

Very nice 🙂

Is it supposed to work out of the box? Asking because there seems to be this check (so the factory itself has to be a value in the dict so I do not see the way to set different arguments here):

if default_factory is not MISSING and default_factory not in subgroups.values():
# TODO: This might a little bit too strict. We don't want to encourage people creating lots
# of classes just to change the default arguments.
raise ValueError("default_factory must be a value in the subgroups dict!")

So

@dataclass
class Config:
    # Which model to use
    model: ModelConfig = subgroups(
        {"model_a": ModelAConfig, "model_b": ModelBConfig},
        default_factory=lambda : ModelAConfig(lr=0.1),
    )

actually results in an ValueError.

Maybe something like this could work?

if default_factory is not MISSING and type(default_factory()) not in subgroups.values(): 
   ...

What version of simple-parsing are you working on? The default_factory is not implemented in 0.0.21.post1 as per #187, but it is now implemented in master branch by #185.

@Tomiinek
Copy link
Author

I hope it is the current master (5042cb4) aka simple-parsing===0.0.21.post1.post4-g5042cb4 with python 3.10.8

@zhiruiluo
Copy link
Contributor

zhiruiluo commented Jan 11, 2023

Thanks for pointing this out. That is very interesting I overlooked this part. Please ignore my above reply. The question is why do we need to set default values which are different from corresponding dataclasses defaults?

@Tomiinek
Copy link
Author

Tomiinek commented Jan 11, 2023

Okay, my situation is the following 🙂

My dataclasses are auto-generated from "serializable" classes, or better said from arguments of their constructors. For example:

class AdamW(TorchAdamW, SerializableObject, ignore_args=["self", "params"]):
    """Wrapper around AdamW to name its parameters for serialization."""

    def __init__(
        self,
        params,
        lr: float = 1e-3,
        eps: float = 1e-6,
        weight_decay: float = 0.01,
        amsgrad: bool = False,
    ):
        super().__init__(
            params,
            lr=lr,
            eps=eps,
            weight_decay=weight_decay,
            amsgrad=amsgrad,
        )

produces a dataclass like this one:

@dataclass
class AdamWConfig:
    lr: float = 0.001
    eps: float = 1e-06
    weight_decay: float = 0.01
    amsgrad: bool = False

This config dataclass is used from different different projects, so I do not want to change the defaults here.
At the same time, each project has its own config related to optimizers, something like:

class OptimizerConfig(Config):
    opt: Union[AdamW.get_config_class(), FusedLAMB.get_config_class()] = subgroups(
        {
            "adamw": AdamW.get_config_class(),
            "fusedlamb": FusedLAMB.get_config_class(),
        },
        # default_factory=lambda: AdamW.get_config_class()(lr=0.005),
    )

and I believe that here, in this optimizer config, is a suitable place for overriding the project-related default setting.

Does it make sense or is this setup weird?
I don't mind using parser.set_defaults or something like that for setting the defaults, but I do not see how to do that for subgroups.

@lebrice
Copy link
Owner

lebrice commented Jan 11, 2023

Hi @Tomiinek , thanks for posting this.

Yes you're right, I've temporarily restricted what can be passed to the subgroups function to only dataclass types. I'm working on this today.

By the way, your dynamic generated config classes look a lot like what I've been doing here: #156

Just hang tight for now 😅 I'll figure out a solution to this soon, and get back to you.

@lebrice
Copy link
Owner

lebrice commented Jan 11, 2023

In the case of my Partial feature, it would look like this:

# Dynamically create a dataclass that will be used for the above type:
# NOTE: We could use Partial[Adam] or Partial[Optimizer], however this would treat `params` as a
# required argument.
# AdamConfig = Partial[Adam]  # would treat 'params' as a required argument.
# SGDConfig = Partial[SGD]    # same here
AdamConfig: type[Partial[Adam]] = config_dataclass_for(Adam, ignore_args="params")
SGDConfig: type[Partial[SGD]] = config_dataclass_for(SGD, ignore_args="params")

@dataclass
class Config:

    # Which optimizer to use.
    optimizer: Partial[Optimizer] = subgroups(
        {
            "sgd": SGDConfig,
            "adam": AdamConfig,
        },
        default_factory=lambda: AdamConfig(lr=3e-4),
    )

https://github.com/lebrice/SimpleParsing/pull/156/files#diff-90a53367e3aa7a79a1ea7d7d9ab8ab940ccb3fb63fa59e37084c47a1384156e3R56-R70

@lebrice
Copy link
Owner

lebrice commented Jan 11, 2023

I'm going to close this issue, since I believe it was fixed by #185 , but I'll create a new one for re-allowing subgroups to have more flexible options. Thanks again for posting @Tomiinek !

@lebrice lebrice closed this as completed Jan 11, 2023
@lebrice
Copy link
Owner

lebrice commented Jan 11, 2023

Well I'll actually first add a test to make sure that this issue is fixed. THEN, I'll close this issue :)

@lebrice lebrice reopened this Jan 11, 2023
@Tomiinek
Copy link
Author

Great! Thank you both guys.

lebrice added a commit that referenced this issue Jan 11, 2023
Fixes #191: Uses the same example from the issue, and the problem
doesn't occur. I consider that convincing enough to close the
issue.

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
lebrice added a commit that referenced this issue Jan 11, 2023
Fixes #191: Uses the same example from the issue, and the problem
doesn't occur. I consider that convincing enough to close the
issue.

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
@lebrice
Copy link
Owner

lebrice commented Jan 12, 2023

Hey @Tomiinek , I've now made this issue #195 and PR here #196 that adds the ability to use functools.partial objects.
This isn't exactly what you're looking for, however, the PR should also have re-allowed dynamically created dataclasses to be used in the subgroups dict.

Let me know what you think! :)

@Tomiinek
Copy link
Author

Hey, that's a really nice solution! 🙂 Pls merge ASAP 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants