Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __eq__ for VCSDependency #314

Closed

Conversation

maksbotan
Copy link

@maksbotan maksbotan commented Mar 28, 2022

VCSDependency already defines __hash__, listing only relevant fields for
hashing. However, when doing dictionary lookup, Python will compare not
only hashes, but also equality by __eq__. Since __eq__ was not defined
for VCSDependency, cache for deferred VCS dependencies in Poetry itself
did not work.

See this line in poetry.puzzle.provider:

https://github.com/python-poetry/poetry/blob/5900f3761cd0811c3e717b356b1b877b971c4ff0/src/poetry/puzzle/provider.py#L174

When project has several VCS dependencies AND poetry tries different
solution, the repo will be cloned and parsed for each solution
independently, while this is obviously not what the user wanted.

For our project this fix reduces 'poetry lock --no-update' time from 4
minutes to 1m20s.

This may be related to problems reported in Poetry issues python-poetry#2094, python-poetry#5194
and python-poetry#4924.

Resolves:

  • Added tests for changed code.
  • Updated documentation for changed code.

@dswalter
Copy link

Your markdown obscures that you're defining __eq__ for VCSDependency in this PR. Clear thinking!

@maksbotan
Copy link
Author

Oops, sorry! This comes from a commit message and I did not think about how Github would render it.

@@ -176,3 +176,12 @@ def __str__(self) -> str:

def __hash__(self) -> int:
return hash((self._name, self._vcs, self._branch, self._tag, self._rev))

def __eq__(self, other: "VCSDependency") -> int:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm not affiliated with this project in any capacity, just a bystander, but mypy checks failed with the following error text:

src/poetry/core/packages/vcs_dependency.py:180: error: Return type "int" of "__eq__" incompatible with return type "bool" in supertype "Dependency"
src/poetry/core/packages/vcs_dependency.py:180: error: Argument 1 of "__eq__" is incompatible with supertype "object"; supertype defines the argument type as "object"
src/poetry/core/packages/vcs_dependency.py:180: note: This violates the Liskov substitution principle
src/poetry/core/packages/vcs_dependency.py:180: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides
src/poetry/core/packages/vcs_dependency.py:180: note: It is recommended for "__eq__" to work with arbitrary objects, for example:
src/poetry/core/packages/vcs_dependency.py:180: note:     def __eq__(self, other: object) -> bool:
src/poetry/core/packages/vcs_dependency.py:180: note:         if not isinstance(other, VCSDependency):
src/poetry/core/packages/vcs_dependency.py:180: note:             return NotImplemented
src/poetry/core/packages/vcs_dependency.py:180: note:         return <logic to compare two VCSDependency instances>
src/poetry/core/packages/vcs_dependency.py:180: error: Return type "int" of "__eq__" incompatible with return type "bool" in supertype "object"

I would recommend trying a bool for the return type of your newly-defined __eq__.

VCSDependency already defines __hash__, listing only relevant fields for
hashing. However, when doing dictionary lookup, Python will compare not
only hashes, but also equality by __eq__. Since __eq__ was not defined
for VCSDependency, cache for deferred VCS dependencies in Poetry itself
did not work.

See this line in poetry.puzzle.provider:

https://github.com/python-poetry/poetry/blob/5900f3761cd0811c3e717b356b1b877b971c4ff0/src/poetry/puzzle/provider.py#L174

When project has several VCS dependencies AND poetry tries different
solution, the repo will be cloned and parsed for each solution
independently, while this is obviously not what the user wanted.

For our project this fix reduces 'poetry lock --no-update' time from 4
minutes to 1m20s.

This may be related to problems reported in Poetry issues #2094, #5194
and #4924.
@maksbotan maksbotan force-pushed the maksbotan/fix-vcs-dep-caching branch from e99bc8c to 5cb0fc3 Compare March 28, 2022 16:33
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@dimbleby
Copy link
Contributor

This change overrides the existing __eq__ implementation inherited from Dependency.

That feels a bit scary to me, but hopefully someone who actually understands this code will be able to give a more helpful review - I just wanted to be sure this was spotted.

@radoering
Copy link
Member

I have the impression that there is more to it and the fix falls short. Probably, the __hash__ method has to be changed, too.

I cannot fully assess the consequences but I'd suggest the following changes:

__hash__ should call super().__hash__() and xor the hashes of all (unchangeable) relevant attributes not defined in the parent class.

__eq__ should call super().__eq__() and compare at least the attributes included in __hash__. I have to admit that I have no idea which attributes have to be included. In general I'd say "all", but that's probably too naive.

@maksbotan
Copy link
Author

Yeah, I was actually able to find an issue with my solution during local testing. I will update the PR once I get to it. Thanks!

@maksbotan maksbotan marked this pull request as draft April 2, 2022 10:54
@radoering
Copy link
Member

Superseded by #370

@radoering radoering closed this May 29, 2022
@maksbotan
Copy link
Author

maksbotan commented May 29, 2022

Thanks for the fix!
Do I understand correctly that now poetry will reuse fetched git data during solving? I.e. git cache will work correctly

@radoering
Copy link
Member

radoering commented May 29, 2022

I assume it should. However, a lot has changed downstream considering git due to python-poetry/poetry#5428. If you want to test:

pipx install --force --suffix=@git git+https://github.com/python-poetry/poetry.git
pipx inject --force poetry@git git+https://github.com/python-poetry/poetry-core.git

Then, you can use poetry@git (instead of poetry).

@maksbotan
Copy link
Author

Wow, thanks @radoering!

Does this also mean that when pyproject.toml mentions exact commit hash, poetry will not refetch repository every time dependencies are recalculated? For me, this is a major source of slowness when doing poetry lock --no-update routinely.

Also, it would be great if poetry could cache wheels built from git dependencies and reuse them when git commit hash does not change. Is this possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants