Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add deep merging for lists #1082

Merged
merged 26 commits into from
Jul 2, 2023
Merged

Conversation

MatteoVoges
Copy link
Contributor

Fixes #1080

Proposed changes

  • new flags in the OmegaConf.merge() function
    • extend_lists: this enables (deep) merge support for ListConfigs (default: False to ensure backwards compability)
    • allow_duplicates: this controls, if two equal items can be in a list (default: False)

Examples

If you have a config:

list:
   - a

and

list:
  - b

you will get with the enabled flag extend_lists this result:

list:
  - a
  - b

The functionality of allow_duplicates is self explanatory, i guess.

Tests and Documentation

  • Added some basic tests and tests for edge cases (nested objects)
  • @omry Where should I add the documentation for this? In docs/source/usage.rst or in docs/notebook/Tutorial.ipynb or somewhere else?

Contributed by

@MatteoVoges
Copy link
Contributor Author

MatteoVoges commented May 24, 2023

The tests show the following error in the lint job:

tests/test_basic_ops_dict.py:276: error: Keywords must be strings  [misc]
Installing missing stub packages:
/home/circleci/project/.nox/lint-3-10/bin/python -m pip install types-Pygments types-colorama types-setuptools


Found 1 error in 1 file (checked 81 source files)

I didn't edit this file, so I guess that this error is due some wrong configuration or setup in CircleCI. Is there anything I have to look into?

@omry
Copy link
Owner

omry commented May 25, 2023

Thanks @MatteoVoges.

@Jasha10, @odelalleau, @shagunsodhani:
What do you think about this functionality? We can discuss here or on #1080.

At a glance it looks pretty nice, but I didn't review properly.

  1. Add a news fragment (similar to what is described here https://hydra.cc/docs/development/documentation/).
  2. Update the docs showing usage of OmegaConf merge. Also update the notebook with an example.
  3. Try to fix the lint error even if you didn't create it (If it's involved it can be in a separate PR).

@odelalleau
Copy link
Collaborator

@Jasha10, @odelalleau, @shagunsodhani: What do you think about this functionality? We can discuss here or on #1080.

I'm good with the proposed new feature. I can't commit to reviewing the code now though.

@MatteoVoges
Copy link
Contributor Author

@omry I updated the docs and added some examples. The linting error is fixed in #1084 . Is there anything left to do?

Copy link
Contributor

@shagunsodhani shagunsodhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me! @omry is there a specific piece that I should look deeper into ?

docs/source/usage.rst Outdated Show resolved Hide resolved
news/1082.feature Outdated Show resolved Hide resolved
omegaconf/omegaconf.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Jasha10 Jasha10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the proposed feature, and the implementation looks good to me.

One nit regarding the tests: the BaseContainer.merge_with method is not documented and is not really intended as a public API. It would be better to test the public-facing OmegaConf.merge facade instead of calling BaseContainer.merge_with in the tests:

tests/test_merge.py Outdated Show resolved Hide resolved
tests/test_merge.py Outdated Show resolved Hide resolved
docs/source/usage.rst Outdated Show resolved Hide resolved
news/1082.feature Outdated Show resolved Hide resolved
omegaconf/basecontainer.py Outdated Show resolved Hide resolved
MatteoVoges and others added 6 commits May 31, 2023 09:48
Co-authored-by: Omry Yadan <omry@yadan.net>
Co-authored-by: Shagun Sodhani <1321193+shagunsodhani@users.noreply.github.com>
Co-authored-by: Jasha Sommer-Simpson <8935917+Jasha10@users.noreply.github.com>
@Jasha10
Copy link
Collaborator

Jasha10 commented Jun 1, 2023

The remove_duplicates flag only makes sense if extend_lists is True.

>>> OmegaConf.merge([1,1,2], [2,2,3], remove_duplicates=True)
[2, 2, 3]

Should we issue an error or warning in the case where remove_duplicates is passed and extend_lists is False?
Or should we adopt a similar Enum-based approach as is used for the structured_config_mode argument to OmegaConf.to_container?

Something like this:

class OmegaConf:
    ...
    def merge(
        ...,
        list_merge_mode: ListMergeMode = ListMergeMode.OVERWRITE,
    )

...

class ListMergeMode(Enum):
    """How `OmegaConf.merge` handles lists"""
    OVERWRITE = 1  # overwrite left-hand list with right-hand list
    EXTEND = 2  # extend left-hand list with right-hand list
    EXTEND_NO_DUPLICATES = 3  # ...

@Jasha10
Copy link
Collaborator

Jasha10 commented Jun 1, 2023

Also, I think the naming of remove_duplicates=True is a little bit confusing.
The duplicates from the right-hand merge argument are removed but the duplicates from the left-hand merge argument are not removed.

>>> list1 = [1, 2, 3, 3]
>>> list2 = [3, 4, 5]
>>> OmegaConf.merge(list1, list2, extend_lists=True, remove_duplicates=True)  # duplicates from list1 not removed
[1, 2, 3, 3, 4, 5]

@MatteoVoges
Copy link
Contributor Author

MatteoVoges commented Jun 2, 2023

I thought about this before as well and my first implementation was indeed using a set. But I decided to go for the currently implemented approach, because the set was scrambling the order of the lists and it was hard to test.

The duplicates from the right-hand merge argument are removed but the duplicates from the left-hand merge argument are not removed.

I actually like the current behavior a lot, because I think if you merge something the original object should not be affected, but only extended. And if you have some duplicate values in your list1, then it is for a reason, and they should appear in the merged object.

Should we issue an error or warning in the case where remove_duplicates is passed and extend_lists is False?

I think a warning would be good here.

Or should we adopt a similar Enum-based approach?

I don't think that this is neccessary, because we don't have many merge options yet and I think two merge modes do not require an enum.

@MatteoVoges MatteoVoges requested a review from Jasha10 June 15, 2023 09:29
@omry
Copy link
Owner

omry commented Jun 15, 2023

We then have the problem, that you have to import the enum, if you want to use this feature. So maybe there is a little hint in the merge-function-docstring. Or we enable both options, so Enum and String like @Jasha10 proposed?

I think using a string an an alternative is a bit unusual and I am not a fan of this duality.

Where would be a good place for the enum? _utils module?

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

Usage can be something like:

from omegaconf import OmegaConf, ListMergeMode

...
OmegaConf.merge(cfg1, cfg2, list_merge_mode = ListMergeMode.EXTEND_IGNORE_DUPLICATES)

A docstring update to OmegaConf.merge is fine. Don't forget to update docs to reflect this.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jun 16, 2023

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/__init__.py).

@odelalleau
Copy link
Collaborator

I can’t check right now but I believe there’s a way to make an enum that can be compared to a string so that just passing the string would work too.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jun 16, 2023

make an enum that can be compared to a string

I think you're referring to enum.StrEnum:

from enum import StrEnum
class ListMergeModeString(StrEnum):
    OVERWRITE = "OVERWRITE"
    EXTEND = "EXTEND"
    EXTEND_NO_DUPLICATES = "EXTEND_NO_DUPLICATES"

assert ListMergeModeString.OVERWRITE == "OVERWRITE"

One issue with StrEnum is that it's only available in python 3.11+.

@omry
Copy link
Owner

omry commented Jun 16, 2023

Given that the enum is becomeing a part of the public API, it should live in omegaconf/OmegaConf.py

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/__init__.py).

I think this works too. Maybe it's better this way to prevent circular imports.

@MatteoVoges
Copy link
Contributor Author

@omry note that the SCMode enum currently lives in omegaconf/base.py (and is exported by omegaconf/init.py).

I even followed the SCMode and the ListMergeMode is implemented in the same way.

One issue with StrEnum is that it's only available in python 3.11+.

Unfortunately, this is then unusable for us.

@odelalleau
Copy link
Collaborator

One issue with StrEnum is that it's only available in python 3.11+.

Yeah but I’m pretty sure I already implemented something similar in older versions… unfortunately I don’t have access to a computer until next week so I can’t really be more specific.

Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to look good.
Looking at the code, I have a suggestion for slightly better enum keys. Let me know what you think (sorry for the excessive bikeshedding).

docs/source/usage.rst Outdated Show resolved Hide resolved
docs/source/usage.rst Outdated Show resolved Hide resolved
news/1082.feature Outdated Show resolved Hide resolved
Comment on lines 798 to 801
class ListMergeMode(Enum):
OVERRIDE = 1 # content from newer list gets taken (default)
EXTEND = 2 # lists get extended
EXTEND_IGNORE_DUPLICATES = 3 # only new (unique) elements get extended
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have went though several iteration of bikeshedding here already, but any thoughts on using append instead of extend?
Also I think for the default REPALCE is more appropriate than override.

Suggested change
class ListMergeMode(Enum):
OVERRIDE = 1 # content from newer list gets taken (default)
EXTEND = 2 # lists get extended
EXTEND_IGNORE_DUPLICATES = 3 # only new (unique) elements get extended
class ListMergeMode(Enum):
REPALCE = 1 # Replaces the target list with the new one (default)
APPEND= 2 # Appends the new list to the target list
APPEND_UNIQUE_VALUE = 3 # Appends items that do not already exist in the target list

Copy link
Collaborator

@Jasha10 Jasha10 Jun 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think APPEND is confusing due to the way that python's builtin list.append method works:

>>> lst = [1,2,3]; lst.extend([4,5,6]); print(lst)
[1, 2, 3, 4, 5, 6]
>>> lst = [1,2,3]; lst.append([4,5,6]); print(lst)
[1, 2, 3, [4, 5, 6]]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think for the default REPALCE is more appropriate than override.

👍🏼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like REPLACE and APPEND_UNIQUE_VALUE a lot. But I agree to Jasha, that we should name it after the python functions, so that extend is extending the list. For APPEND_UNIQUE_VALUE I like the way, that we don't extend lists anymore, but rather looking at the actual values and if they are unique in the target list. @omry ,what do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, from the perspective of the target list:

# replace target list
REPLACE
# Extend target list with new list
EXTEND
# Extend target list with items not already present in it.
EXTEND_UNIQUE 

@MatteoVoges
Copy link
Contributor Author

MatteoVoges commented Jun 21, 2023

The linting error is fixed in #1090 (and here as well)

@MatteoVoges MatteoVoges requested a review from omry June 28, 2023 07:59
Copy link
Collaborator

@Jasha10 Jasha10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

news/1082.feature Outdated Show resolved Hide resolved
Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@omry omry merged commit 7dae67e into omry:master Jul 2, 2023
2 checks passed
@MatteoVoges
Copy link
Contributor Author

Thanks, are there any plans for a (pre-)release in the near future?

@omry
Copy link
Owner

omry commented Jul 3, 2023

Maybe @shagunsodhani or @Jasha10 can do that.
In the mean time, you can depend on github repo revision from your build.

@MatteoVoges MatteoVoges deleted the 1080-add-list-deep-merging branch July 13, 2023 10:29
@Jasha10
Copy link
Collaborator

Jasha10 commented Aug 9, 2023

@MatteoVoges fyi I've just uploaded v2.4.0.dev0 to pypi.org:
https://pypi.org/project/omegaconf/2.4.0.dev0/

@MatteoVoges
Copy link
Contributor Author

Thanks for that! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support deep merging of configs
5 participants