Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance fix for simplified marker simplifications #547

Merged
merged 1 commit into from
Jan 22, 2023

Conversation

radoering
Copy link
Member

While working on full support for overlapping markers, I encountered a combinatorial explosion when building a union of markers (see test case).

The issue is that we introduced cnf without introducing a counterpart of union_simplify, which leads to unnecessary big markers. Therefore, I added intersect_simplify analoguous to union_simplify.

Further, the simplifications in #530 went a bit too far in removing the common_markers/unique_markers simplification in union_simplify. I reverted this removal and added analogue functionality to intersect_simplify.

Comparing the number of items in itertools.product in dnf of union (after cnf) shows the benefit of the simplifications:

union (see test case) number and length of markers in conjunction after cnf number of items in itertools.product in dnf
without PR 31 (1-4) > 2**44
with simple intersect_simplifiy 9 (1-2) 256
with revival of common_markers/unique_markers simplification 3 (1-2) 4

unique_intersection = MarkerUnion(*unique_markers).intersect(
MarkerUnion(*other_unique_markers)
)
if isinstance(unique_intersection, SingleMarker):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a possible missed opportunity here - though I suspect not an important one.

unique_intersection could be EmptyMarker - actually that feels relatively likely and, having got this far, we ought to take the simplification in that case.

ie suggest if isinstance(unique_intersection, (SingleMarker, EmptyMarker)):

You could restore symmetry by testing if isinstance(unique_union, (SingleMarker, AnyMarker)): in the other place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could restore symmetry by testing if isinstance(unique_union, (SingleMarker, AnyMarker)): in the other place

In the other place we can do better by testing if not isinstance(unique_union, MarkerUnion):. Doing the same with MultiMarker in this place is not possible because we always prefer dnf:

def intersection(*markers: BaseMarker) -> BaseMarker:
return dnf(MultiMarker(*markers))
def union(*markers: BaseMarker) -> BaseMarker:
conjunction = cnf(MarkerUnion(*markers))
if not isinstance(conjunction, MultiMarker):
return conjunction
return dnf(conjunction)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other place we can do better by testing if not isinstance(unique_union, MarkerUnion)

I don't think this is doing better, though, because it's never possible to achieve the other types. ie I think that the two different formulations are equivalent, one of them just restores symmetry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I am not absolutely 100% mathematically certain of the claim that unique_union can't be a MultiMarker - but it is at any rate difficult enough to make it happen that I can't figure out how to do it (and also rare enough that the unit tests don't achieve it either).

Comparing the value of the possible micro-optimisation in case a MultiMarker does occur here against the value of slightly prettier code - I'd find it hard to make a strong case either way. Personally I am drawn to the symmetry but no big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_union should be a MultiMarker for something like

unique_markers = {'python_version >= "3.7"', 'python_version < "3.11"'}
other_unique_markers = {'python_version >= "3.8"', 'python_version < "3.12"'}

However, due to the nature of dnf, I believe there will always be another marker with only one of the unique markers so the markers will be merged later anyway. As you said it's probably just a micro-optimization.

I just realized that allowing MultiMarker would require to use lists instead of sets for unique_markers and other_unique_markers to keep deterministic results. Maybe, that's what tips the scales. So let's choose symmetry for now until there is a good reason to choose the opposite.

src/poetry/core/version/markers.py Outdated Show resolved Hide resolved
@dimbleby
Copy link
Contributor

dimbleby commented Jan 4, 2023

This saddens me a little, I was quite pleased with the removal of all the special cases! But the testcase is rather convincing, all the more so if it was encountered in the real world.

Left a couple of more or less pedantic comments

@radoering radoering force-pushed the performance-cnf-dnf branch from 845a5be to 258bb38 Compare January 6, 2023 16:09
@radoering
Copy link
Member Author

Great suggestions. 👍

@neersighted neersighted enabled auto-merge (rebase) January 22, 2023 17:26
* intersect_simplify to reduce the number of items of itertools.product in cnf (analoguous to union_simplify which primarily affects dnf)
* revival of common_markers/unique_markers simplification, which has been removed in poetry-core#530 but helps massively in reducing the cost of cnf/dnf

Co-authored-by: David Hotham <david.hotham@blueyonder.co.uk>
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
1.0% 1.0% Duplication

@neersighted neersighted merged commit e6ab533 into python-poetry:main Jan 22, 2023
@radoering radoering mentioned this pull request Jan 24, 2023
mwalbeck pushed a commit to mwalbeck/docker-python-poetry that referenced this pull request Feb 28, 2023
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [poetry](https://python-poetry.org/) ([source](https://github.com/python-poetry/poetry), [changelog](https://python-poetry.org/history/)) | minor | `1.3.2` -> `1.4.0` |

---

### Release Notes

<details>
<summary>python-poetry/poetry</summary>

### [`v1.4.0`](https://github.com/python-poetry/poetry/blob/HEAD/CHANGELOG.md#&#8203;140---2023-02-27)

[Compare Source](python-poetry/poetry@1.3.2...1.4.0)

##### Added

-   **Add a modern installer (`installer.modern-installation`) for faster installation of packages and independence from pip** ([#&#8203;6205](python-poetry/poetry#6205)).
-   Add support for `Private ::` trove classifiers ([#&#8203;7271](python-poetry/poetry#7271)).
-   Add the version of poetry in the `@generated` comment at the beginning of the lock file ([#&#8203;7339](python-poetry/poetry#7339)).
-   Add support for `virtualenvs.prefer-active-python` when running `poetry new` and `poetry init` ([#&#8203;7100](python-poetry/poetry#7100)).

##### Changed

-   **Deprecate the old installer, i.e. setting `experimental.new-installer` to `false`** ([#&#8203;7358](python-poetry/poetry#7358)).
-   Remove unused `platform` field from cached package info and bump the cache version ([#&#8203;7304](python-poetry/poetry#7304)).
-   Extra dependencies of the root project are now sorted in the lock file ([#&#8203;7375](python-poetry/poetry#7375)).
-   Remove upper boundary for `importlib-metadata` dependency ([#&#8203;7434](python-poetry/poetry#7434)).
-   Validate path dependencies during use instead of during construction ([#&#8203;6844](python-poetry/poetry#6844)).
-   Remove the deprecated `repository` modules ([#&#8203;7468](python-poetry/poetry#7468)).

##### Fixed

-   Fix an issue where an unconditional dependency of an extra was not installed in specific environments ([#&#8203;7175](python-poetry/poetry#7175)).
-   Fix an issue where a pre-release of a dependency was chosen even if a stable release fulfilled the constraint ([#&#8203;7225](python-poetry/poetry#7225), [#&#8203;7236](python-poetry/poetry#7236)).
-   Fix an issue where HTTP redirects were not handled correctly during publishing ([#&#8203;7160](python-poetry/poetry#7160)).
-   Fix an issue where `poetry check` did not handle the `-C, --directory` option correctly ([#&#8203;7241](python-poetry/poetry#7241)).
-   Fix an issue where the subdirectory information of a git dependency was not written to the lock file ([#&#8203;7367](python-poetry/poetry#7367)).
-   Fix an issue where the wrong Python version was selected when creating an virtual environment ([#&#8203;7221](python-poetry/poetry#7221)).
-   Fix an issue where packages that should be kept were uninstalled when calling `poetry install --sync` ([#&#8203;7389](python-poetry/poetry#7389)).
-   Fix an issue where an incorrect value was set for `sys.argv[0]` when running installed scripts ([#&#8203;6737](python-poetry/poetry#6737)).
-   Fix an issue where hashes in `direct_url.json` files were not written according to the specification ([#&#8203;7475](python-poetry/poetry#7475)).
-   Fix an issue where poetry commands failed due to special characters in the path of the project or virtual environment ([#&#8203;7471](python-poetry/poetry#7471)).
-   Fix an issue where poetry crashed with a `JSONDecodeError` when running a Python script that produced certain warnings ([#&#8203;6665](python-poetry/poetry#6665)).

##### Docs

-   Add advice on how to maintain a poetry plugin ([#&#8203;6977](python-poetry/poetry#6977)).
-   Update tox examples to comply with the latest tox release ([#&#8203;7341](python-poetry/poetry#7341)).
-   Mention that the `poetry export` can export `constraints.txt` files ([#&#8203;7383](python-poetry/poetry#7383)).
-   Add clarifications for moving configuration files ([#&#8203;6864](python-poetry/poetry#6864)).
-   Mention the different types of exact version specifications ([#&#8203;7503](python-poetry/poetry#7503)).

##### poetry-core ([`1.5.1`](https://github.com/python-poetry/poetry-core/releases/tag/1.5.1))

-   Improve marker handling ([#&#8203;528](python-poetry/poetry-core#528),
    [#&#8203;534](python-poetry/poetry-core#534),
    [#&#8203;530](python-poetry/poetry-core#530),
    [#&#8203;546](python-poetry/poetry-core#546),
    [#&#8203;547](python-poetry/poetry-core#547)).
-   Validate whether dependencies referenced in `extras` are defined in the main dependency group ([#&#8203;542](python-poetry/poetry-core#542)).
-   Poetry no longer generates a `setup.py` file in sdists by default ([#&#8203;318](python-poetry/poetry-core#318)).
-   Fix an issue where trailing newlines were allowed in `tool.poetry.description` ([#&#8203;505](python-poetry/poetry-core#505)).
-   Fix an issue where the name of the data folder in wheels was not normalized ([#&#8203;532](python-poetry/poetry-core#532)).
-   Fix an issue where the order of entries in the RECORD file was not deterministic ([#&#8203;545](python-poetry/poetry-core#545)).
-   Fix an issue where zero padding was not correctly handled in version comparisons ([#&#8203;540](python-poetry/poetry-core#540)).
-   Fix an issue where sdist builds did not support multiple READMEs ([#&#8203;486](python-poetry/poetry-core#486)).

##### poetry-plugin-export ([`^1.3.0`](https://github.com/python-poetry/poetry-plugin-export/releases/tag/1.3.0))

-   Fix an issue where the export failed if there was a circular dependency on the root package ([#&#8203;118](python-poetry/poetry-plugin-export#118)).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4xNTIuNSIsInVwZGF0ZWRJblZlciI6IjM0LjE1Mi41In0=-->

Reviewed-on: https://git.walbeck.it/walbeck-it/docker-python-poetry/pulls/655
Co-authored-by: renovate-bot <bot@walbeck.it>
Co-committed-by: renovate-bot <bot@walbeck.it>
@radoering radoering deleted the performance-cnf-dnf branch November 24, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants