-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hash of dependency instead of object in a provider's deferred cache #5903
Conversation
Thanks, for your contribution and the detailed information. I'm not sure whether the Maybe, we should try if replacing
in by
does the trick, too. Do you want to try and create an alternative PR if it works so we can decide which path to take? |
@radoering yes this addresses a different one of the downloads. I think we have (at least!):
Ideally poetry should also arrange that the |
@radoering thank you for the suggestion. I shall create a second PR today. I am unsure how What also comes to mind now is that we only update the constraint field in the dependency if it is not found in the cache; when we do have a cache hit we immediately return the package. (at least for the URLDependency). Should we change that? |
I had checked the box for having added tests because I thought it was inapplicable in this case, but I am starting to think whether we should test if package are re-downloaded. I am however unsure how to write such a test. |
Additional tests are usually a good idea. 😄 A simple unit test in
You might also write a test for |
Using the hash as key probably makes |
Due to the shortcomings mentioned in my previous comment I'm closing this one in favor of python-poetry/poetry-core#405. A test checking that a dependency is only downloaded once is still appreciated. Please create a separate PR if you want to write such a test and note that the test should fail until a new poetry-core version is released and used in master of poetry. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Pull Request Check List
Resolves: #5902
This is my first pull request so apologies if I did something wrong, I am open to feedback!
As described in the referred issue poetry often repeatedly re-downloads packages when resolving dependencies. This is due a bug which causing poetry thinking cached packages are not cached yet. The providers cache is a dictionary which uses a Dependency object a key (and the package as the value). During a dictionary lookup python checks equivalence on both the hash and eq for keys, but in some cases the Dependency object can change breaking the eq check.
This for example happens with the URL dependency, where the package version in only known after downloading which leads to the following:
This is shown in the code snippet below:
When searching for this dependency again we check the cache dictionary with a new URLDependency as a key, which has not been updated with the version number yet. We have missed the cache.
Adding the package to the cache before updating the dependency object is not an elegant solution in my opinion because that would make a package search with the updated dependency object miss the cache.
This pull request solves the problem by using the hashes of the Dependency objects as keys in the cache dictionary. The currently implemented hash methods already uniquely describe the package they download. (For example the URLDependency hash is made up of the url and package name; the VCS cache hash is made up of the package name, repository, branch, tag, and revision).
Another approach would have been to re-implement the eq method but I think this is also the wrong approach, because an URL dependency with an unbounded version is different than a URL dependency with a set version.