Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Pure python package detection #628

Merged
merged 5 commits into from
Jul 16, 2019
Merged

[Feature] Pure python package detection #628

merged 5 commits into from
Jul 16, 2019

Conversation

lambdaclan
Copy link
Contributor

@lambdaclan lambdaclan commented May 17, 2019

Description

[Depends on PR #602, #654 #656]

Update this PR has been split into 3 different PRs

Since all 3 PRs are based on #656, it might make sense to resolve that one first.

  1. This PR [Feature] Pure python package detection #628 for pure python package detection

  2. [Feature] Pip QOL commands -- EXPERIMENTAL #657 for the experimental pip QOL commands

  3. [Fix] Package schema and request errors #658 for the schema and package request error fixes

Pure python package detection

As we embrace modern Python and try to improve the way rez interfaces with pip by enforcing
Wheel package installation rather than the aging and legacy source and Egg distributions we can further improve rez-pip's functionality by taking advantage of the rich metadata that comes with Wheel distributions.

This PR introduces a purity detector for Python packages. As defined on the Wheel Binary Package Format PEP491

Wheel preserves the "purelib" vs. "platlib" distinction, which is significant on some platforms. For example, Fedora installs pure Python packages to '/usr/lib/pythonX.Y/site-packages' and platform dependent packages to '/usr/lib64/pythonX.Y/site-packages'.

A wheel with "Root-Is-Purelib: false" with all its files in {name}-{version}.data/purelib is equivalent to a wheel with "Root-Is-Purelib: true" with those same files in the root, and it is legal to have files in both the "purelib" and "platlib" categories.

In practice a wheel should have only one of "purelib" or "platlib" depending on whether it is pure Python or not and those files should be at the root with the appropriate setting given for "Root-is-purelib".

This addition enables us to streamline variant requirements:

TODO detect if platform/arch/os necessary, no if pure python

To further ensure compatibility with current platform this PR also adds some experimental features
such as package version listing, querying and Python version constraint based search. These features
use the most "reliable" source of package data being the PyPi via the new JSON API.

The reason these features are marked experimental is because currently there is no standard of how the metadata should be defined and sometimes it varies widely between packages. According to my tests it seems to be working for the most part but still a lot of things to consider.

Relates # (issue)
rez #610, #575 maybe more...

Breakdown

  • pure/impure package detection

  • pure packages now only use major Python version (for example python-2)

  • drop os version dependency all together platform / architecture / (os) / python for all packages (pure and impure)

Process

  • Python package purity detector
    • this is using the Wheel metadata file named WHEEL. WHEEL is the wheel metadata specific to a build of the package. The Wheel metadata contents are defined here

Type of change

  • New feature (non-breaking change which adds functionality)

Remarks

  • Requires pip >=19.0
  • Requires distlib >= 0.28

How Has This Been Tested?

I have installed a plethora of packages from known to obscure in order to check the
purity detector with good success rate.
Some issues still occur (unable to build a wheel - very rare) depending on the platform but the same problems would also occur with the current implementation.

Test Configuration:

  • Python version: 2.7.15, 2.7.16
  • OS: Linux, Windows 10
  • Toolchain: rez 2.33, pip 18.1, 19.1.1, distlib 0.2.9.post0

Todos

  • Tests

Demo

  • Pure / Impure package detector

purity

@mottosso
Copy link
Contributor

Nice one, how about we merge our efforts on pip into one great PR? There's some slight duplication of effort at the moment, and with the number of PRs growing and nothing seeming to get merged, at least anyone interested in improving their pip experience would know which branch to merge locally.

How about I merge the above into this one, and then we continue from there?

@lambdaclan
Copy link
Contributor Author

Helle Marcus, sure thing this is the beauty of working on open source projects, the ability to collaborate. What is important is to get the improvements merged into master so that everyone can benefit from them. Saying that I am sure there will be some conflicts if you try to merge your changes with mine because of work duplication as you mentioned.

We will need to get Allan involved to get feedback on the best approach to follow and then we can work from there to get the changes merged.

In terms of work duplication, this is something I was discussing with Thorsten, maybe we need some sort of task board where everyone can specify what they are working on to avoid doing the same things twice. This will also promote collaborations between contributors which can only be a good thing.

I would love to hear the community's opinion on this.

src/rez/pip.py Outdated
@@ -97,7 +100,17 @@ def run_pip_command(command_args, pip_version=None, python_version=None):
pip_exe, context = find_pip(pip_version, python_version)
command = [pip_exe] + list(command_args)

if context is None:
if process_output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this meant for?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it fails because the version doesn't exist, it will return the available versions. Not entirely sure if it's really needed here.

Copy link
Contributor Author

@lambdaclan lambdaclan May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to do with the ability to list all available versions of a package based on PyPi for the current Python interpreter. Although a bit of a hack as outlined here there is currently no way of retrieving this list. Causing the pip install to fail is intentional in a way to retrieve the available packages.

Edit: Potentially this could be done using the PyPI json api by retrieving the releases list in a way similar to the other experimental features but that does not differentiate the Python versions required and will need further parsing provided the metadata is correct. Causing the installer to fail seems to be the most reliable way for now.

@mottosso
Copy link
Contributor

mottosso commented May 25, 2019

I've been contemplating how to make the implementation of pip.py a little simpler.

  • Currently, rez pip creates a context internally (or uses system-python)

But what if it didn't need to do that? What if instead could call on whichever Python and pip was available at the time of being called? If it did, we could strip those parts where it's establishing a context and checking for a system Python, and get called like this.

# Use system Python, if any
$ rez pip --install six
ERROR: Could not find Python or pip
       Usage: rez env python-2 pip-19 -- pip install six

# Use explicit Python, and pip
$ rez env python-2.7 pip-19 -- rez pip --install six

Now rez pip is able to collaborate with rez env as expected, and as an added bonus makes the syntax explicit, and able to handle requests like normal. Then, for users who prefer pip be installed as part of python (e.g. me), they could simply omit pip-19 altogether.

$ rez env python-3.7 -- rez pip --install six

Clear, familiar and consistent with a potential integration of rez conda, rez choco and rez vcpkg etc.

Would you be interested in this for this PR, @lambdaclan?

@JeanChristopheMorinPerso
Copy link
Member

Clear, familiar and consistent with a potential integration of rez conda, rez choco and rez vcpkg etc.

I'm not sure we would really want to go the route of implementing conda, choco and others package managers IMHO. If you ever have the chance to talk to a conda developer, there is a high number of chances he will tell you it's not that fun to re-package everything.

drop os version dependency all together platform / architecture / (os) / python for all packages (pure and impure)

@lambdaclan What's the reasoning behind this? And could you elaborate more on this?

ability to list all available versions of a package based on PyPi

Does this do anything different from pip search?

I'll come back to do a real code review later for sure.

@bpabel
Copy link
Contributor

bpabel commented May 26, 2019

What's the reasoning behind this? And could you elaborate more on this?

Because pure python packages are cross platform. There's no need to variant them by os/platform/arch. And many python packages are py2/3 compliant, so you don't have to variant them by python version either.

@JeanChristopheMorinPerso
Copy link
Member

But why drop the OS/platform/arch for non pure packages? That was more my question (I should have been more specific in the first place).

@mottosso
Copy link
Contributor

for all packages (pure and impure)

Oh, also noticed this. Impure packages, like PySide, would only run on a given platform, arch and OS so those probably need to remain.

Copy link
Member

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first code review. Overall it looks good. Though there seems to be some duplication with other PRs (at least it's the impression I have).

Also, I'd really like to know why the os/platform/arch is dropped for any pure and non pure packages.

Also, it would be great to add tests for this change.

Thanks

return

if opts.query:
URL = "https://pypi.python.org/pypi/{0}/{1}/json"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a setting. Because as I've said a couple of times, not everybody have access to a direct internet connection at work (any studios working with Disney, Marvel, etc).

Also, some studios have an internal PyPI index (Nexus, JFrog, devpi, fileystem based (though filesystem won't work with JSON)) and they will want to go through it (maybe because they have a security team verifying the source code of each package, or have tools that does it and so they only give access to packages considered secure, or whatever other reasons).

Ideally, we should take into consideration the user configs for pypi (see https://pip.pypa.io/en/stable/user_guide/#configuration). Maybe by using pip config list/get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Jc good point. I was thinking that anyone using rez-pip would have an active internet connection but did not consider internal PyPI indexes. Although the query option help clearly states that is for use with PyPi.

If we add the URL as an option how will the query work? I am assuming we will need to add implementations for each method? A bit unfamiliar with internal index since I never used one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unfamiliar with internal index since I never used one.

That's fine. I can help with that. Though all proxies provide support for pip search, not all provide the JSON API that PyPI provides. I'll have to check with the major proxies (devpi, artifactory, nexus). But I think this can be done later and I don't think it's necessary for this current PR.

python_requires = filter(None, python_requires)

if not python_requires:
print("Package version is valid but PyPi does not contain a specific Python version. Too old?")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too old?

Not necessarily too old. It could be too old, or that the maintainer doesn't know he should add a python_requires in his package or set classifiers for python versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the reason I said too old is because it normally happens with old packages newer ones seem to have that metadata always defined. This was mostly for debugging purposes so will be happy to have more descriptive messages.

try:
data = json.load(urllib2.urlopen(URL.format(opts.PACKAGE, opts.query)))

classifiers = data["info"]["classifiers"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also look at the requires_python attribute, just in case both differs or if only one was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using the requires_python metadata as well but did not want it to conflict with the classifier as sometimes they do and then will be hard to decide which to pick. Also requires python only states a single version whereas classifiers are more descriptive...

Maybe having a check to see if they at least match or something.

The query option is using classifiers whereas the constrain one is using python_version and requires_python. There is definitely some overlap here that can be improved like sticking to one validation method for example always use requires_python or come up with a an order of importance check ala:

  1. requires_python
  2. python_version
  3. classifier

The problem of course lies with the the non standardization of the whole metadata so it is hard to decide which one to use and why..

I will take it into consideration thanks for the feedback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also requires python only states a single version whereas classifiers are more descriptive...

requires_python can describe quite a lot actually: requires_python=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" would be a valid value. (See https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires)

The thing is that I don't know if the classifiers are automatically added when requires_python is used and the classifiers were not manually specified. Maybe you know (i basically didn't have time to try).

There is definitely some overlap here that can be improved like sticking to one validation method for example always use requires_python or come up with a an order of importance check ala:

  1. requires_python
  2. python_version
  3. classifier

That would probably do it yes, but would only be necessary if the classifiers are not automatically added by requires_python

@@ -128,6 +128,10 @@ def get_package(self):
def _get_data(self):
data = self._data.copy()

if "requires" in data:
if None in data["requires"]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

data["requires"] = list(filter(None, data["requires"]))

Which is slightly more pythonic. Basically it will do one for loop instead of two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit is better than implicit.
Simple is better than complex.

Just joking, as a Haskeller I definitely approve but sometimes I find it hard to decide which way to go :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that filter(None, ...) will filter out anything that evaluates to False, not just None values.

src/rez/pip.py Outdated
@@ -97,7 +100,17 @@ def run_pip_command(command_args, pip_version=None, python_version=None):
pip_exe, context = find_pip(pip_version, python_version)
command = [pip_exe] + list(command_args)

if context is None:
if process_output:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it fails because the version doesn't exist, it will return the available versions. Not entirely sure if it's really needed here.

packages = pkg_resources.find_distributions(destpath)
dist = next((package for package in packages if package.key == installed_dist.key), None)
wheel_data = dist.get_metadata('WHEEL')
wheel_data = Parser().parsestr(wheel_data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little comment here that points to the documentation about that would be great (like https://www.python.org/dev/peps/pep-0566/#json-compatible-metadata).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I will add it.

wheel_data = dist.get_metadata('WHEEL')
wheel_data = Parser().parsestr(wheel_data)

return true_table[wheel_data["Root-Is-Purelib"]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a little link to https://www.python.org/dev/peps/pep-0427/#what-s-the-deal-with-purelib-vs-platlib or something else that explains a little bit more what's the deal with Root-Is-Purelib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I will add it.

@lambdaclan
Copy link
Contributor Author

lambdaclan commented May 27, 2019

Hello Jc, Marcus and Brendan thank you all very much for the feedback regarding the PR. @JeanChristopheMorinPerso thank for the review! I will be answering your questions so that we can decide the best way to proceed.

Edit: As this PR largely depends on the wheel based installation PR #602 that PR will need to get merged before continuing with this one. Now that Allan had an initial look of said PR we will hopefully have a conclusion fairly soon.

@lambdaclan
Copy link
Contributor Author

lambdaclan commented May 27, 2019

drop os version dependency all together platform / architecture / (os) / python for all packages (pure and impure)

@lambdaclan What's the reasoning behind this? And could you elaborate more on this?

Hello Jc, apologies for my formatting which seems to have made things a bit confusing. Only pure packages drop OS, platform and arch. Not pure packages only drop os version dependency the one
in the bracket.

[platform / architecture / (os) / python - major.minor] -- impure
[python major] -- pure

ability to list all available versions of a package based on PyPi

Does this do anything different from pip search?

Yes, pip search only returns the latest version of the package queried along with any other package that might share the same word. The current implementation only returns the versions of the queried package that are available for install.

I added this feature in case you want to install an older version and work well with the rest of the experimental features added in the PR.

@lambdaclan
Copy link
Contributor Author

I've been contemplating how to make the implementation of pip.py a little simpler.

* Currently, `rez pip` creates a context internally (or uses system-python)

But what if it didn't need to do that? What if instead could call on whichever Python and pip was available at the time of being called? If it did, we could strip those parts where it's establishing a context and checking for a system Python, and get called like this.

# Use system Python, if any
$ rez pip --install six
ERROR: Could not find Python or pip
       Usage: rez env python-2 pip-19 -- pip install six

# Use explicit Python, and pip
$ rez env python-2.7 pip-19 -- rez pip --install six

Now rez pip is able to collaborate with rez env as expected, and as an added bonus makes the syntax explicit, and able to handle requests like normal. Then, for users who prefer pip be installed as part of python (e.g. me), they could simply omit pip-19 altogether.

$ rez env python-3.7 -- rez pip --install six

Clear, familiar and consistent with a potential integration of rez conda, rez choco and rez vcpkg etc.

Would you be interested in this for this PR, @lambdaclan?

Hello Marcus, I completely agree that rez-pip is due for an overall revamp and I am in favour of simplifying the process.

I like the idea of having rez env collaborate with rez-pip but we will also need to decide how to deal with the standalone rez-pip command (ie not using env). If we decide to get rid of context all together and use system python pip then this might work. I know that on Windows I cannot use the pip package because of some symlink issue so I always delete the package after rez-bind in order to fallback to system version.

Maybe first of all we need to decide how rez will be installed in the future through pip or through a python install? I can see things getting overly confusing with which pip is going to be used when having so many options. Is the same pip that installed rez? Is it the pip that is bundled with the python that installed rez? Is it something else etc...

I am in favour of setting the system pip as the default and allowing rez-env to interface with rez-pip ( and happy to collaborate to make that happen 👍 ) if the user chooses to do so but I have to agree with Jc about being sceptical with regards to adding integration with other packagers such as conda, choco vcpkg etc. This will add unnecessary complexity for some specialized scenarios. conda is used for scientific python, choco and vcpkg are Windows specific and not widely used so it is worth it overall?

I like having options but too many options can make things intimidating for users so maybe a whole new tool would make more sense rez-conda, rez-choco etc. Rez plugins here we go...

@lambdaclan
Copy link
Contributor Author

Also, it would be great to add tests for this change.

Once the wheel based installation PR is merged I will add tests and adjust the code as requested. I am a bit hesitant in doing so right away because if the wheel install PR changes or is rejected for whatever reason I will need to re-work a lot of this anyway.

@mottosso mottosso mentioned this pull request May 27, 2019
11 tasks
@nerdvegas
Copy link
Contributor

Hi @lambdaclan, I'm ready to merge #602, just letting you know that that will happen tomorrow. I'll think about backwards compatibility and may make minimal changes to keep that in place, I'll let you know.

Thx
A

@nerdvegas
Copy link
Contributor

Just want to mention about this approach:

rez env python-3.7 -- rez pip --install six

Ie to use the current rez-env as a basis for determining which python interpreter to use for the installation. We should not take this approach, because in all other cases, the rez cli tools are intended to work independently of the current environment, rez-based or not. It also closely couples rez-pip with rez's own installation procedure.

Options to control the python/pip packages used for installation should just be arguments to the rez-pip command itself; falling back to the system installation of anything should be the last choice (or enabled explicitly via another option).

@nerdvegas
Copy link
Contributor

Also letting you know - there's another developer here at Method who deals with all our software installation into rez packages. We have our own toolset here for doing that. We would really rather not be maintaining yet another tool for this, so we're keenly interested in this development.

Copy link
Member

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another pass of code review. Let me know what you thinkk.

try:
data = json.load(urllib2.urlopen(URL.format(opts.PACKAGE, opts.query)))

classifiers = data["info"]["classifiers"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also requires python only states a single version whereas classifiers are more descriptive...

requires_python can describe quite a lot actually: requires_python=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" would be a valid value. (See https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires)

The thing is that I don't know if the classifiers are automatically added when requires_python is used and the classifiers were not manually specified. Maybe you know (i basically didn't have time to try).

There is definitely some overlap here that can be improved like sticking to one validation method for example always use requires_python or come up with a an order of importance check ala:

  1. requires_python
  2. python_version
  3. classifier

That would probably do it yes, but would only be necessary if the classifiers are not automatically added by requires_python

classifiers = data["info"]["classifiers"]
python_requires = [classifier for classifier in classifiers if "Programming Language :: Python ::" in classifier]
python_requires = ''.join(python_requires)
python_requires = python_requires.split("Programming Language :: Python :: ")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also take care of Programming Language :: Python :: 3 :: Only, see https://pypi.org/classifiers/.

python_requires = filter(None, python_requires)

if not python_requires:
print("Package version is valid but PyPi does not contain a specific Python version. Too old?")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra minor: It's PyPI, not PyPi

return

if opts.query:
URL = "https://pypi.python.org/pypi/{0}/{1}/json"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unfamiliar with internal index since I never used one.

That's fine. I can help with that. Though all proxies provide support for pip search, not all provide the JSON API that PyPI provides. I'll have to check with the major proxies (devpi, artifactory, nexus). But I think this can be done later and I don't think it's necessary for this current PR.

break
elif python_version == "py3":
if requires_python:
package_range = VersionRange(requires_python)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VersionRange('>=2.6, !=3.0.*, !=3.1.*, !=3.2.*, <4') will unfortunately not work. The package pytest is a good example that contains such a requires_python. You might want to take a look at the packaging package that might help us with this (see https://packaging.pypa.io/en/latest/).

if k not in valid:
valid.append(k)
break
print("The following versions of {0} are compatible with Python {1}".format(opts.PACKAGE, opts.constrain))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I would move this log into the else at line 133.

@@ -59,7 +180,8 @@ def print_variant(v):
print()
if installed_variants:
print("%d packages were installed:" % len(installed_variants))
for variant in installed_variants:
for variant, purity in zip(installed_variants, variant_types):
print("[{}]".format(purity))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this log for debugging purposes?

src/rez/pip.py Outdated
for installed_file in distribution.list_installed_files(allow_fail=True):
source_file = os.path.normpath(os.path.join(destpath, installed_file[0]))
for installed_file in distribution.list_installed_files():
if 'bin' + os.sep in installed_file[0]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if something like:

if installed_file[0].startswith(os.path.join(destpath, 'bin')):
    installed = os.path.relpath(installed_file[0], os.path.join('..','..'))

would be more robust (in the case where someone would have a subpackage called bin...). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even just if os.path.isabs(installed_file[0]):.

@lambdaclan lambdaclan mentioned this pull request Jun 25, 2019
1 task
@lambdaclan lambdaclan changed the title Feature/pure python package detection and experimental quality of life improvements [Feature] Pure python package detection and experimental quality of life improvements Jul 4, 2019
@lambdaclan lambdaclan mentioned this pull request Jul 4, 2019
6 tasks
@lambdaclan
Copy link
Contributor Author

lambdaclan commented Jul 4, 2019

Also requires python only states a single version whereas classifiers are more descriptive...

requires_python can describe quite a lot actually: requires_python=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" would be a valid value. (See https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires)

The thing is that I don't know if the classifiers are automatically added when requires_python is used and the classifiers were not manually specified. Maybe you know (i basically didn't have time to try).

There is definitely some overlap here that can be improved like sticking to one validation method for example always use requires_python or come up with a an order of importance check ala:

  1. requires_python
  2. python_version
  3. classifier

That would probably do it yes, but would only be necessary if the classifiers are not automatically added by requires_python

Originally posted by @JeanChristopheMorinPerso in #628

After thinking a bit more about this, it might make sense to restructure the commands in order to make their intent more apparent, easier to understand.

As discussed above we have the following fields that we can use to obtain some sort of an idea
about Python requirements of a library:

  1. requires_python
  2. python_version
  3. classifiers

Now it goes without saying that since nothing is formalized, which of the values above each package actually defines can vary widely. According to my initial impressions after playing with this for a while, I realized that normally the definition occurrence is as follows:

classifiers > python_version > requires_python

overview (global) --> 2 vs 3 (per version) --> specific requirements (per version)

funnily the one of most value is the least defined one...

If we put that aside for now, what I have in mind is to have two commands.

  1. A general query command that will give an overview of all the versions of Python compatible
    with queried package.
rez-pip -q (--query) package_name

There will be no version argument anymore. This is just to check if a package (of any version) is compatible with my current Python. For this I am thinking of using the classifiers. I could use requires_python as well but although like you mention above, can be used to describe a lot of info it hardly ever does... because is not normally defined.

Example django 2.1.1

classifiers:

6 | "Programming Language :: Python"
7 | "Programming Language :: Python :: 3"
8 | "Programming Language :: Python :: 3.5"
9 | "Programming Language :: Python :: 3.6"
10 | "Programming Language :: Python :: 3.7"
11 | "Programming Language :: Python :: 3 :: Only"

requires_python | ">=3.5"

We can see that they match but the first classifier might give the meaning that any Python > 3 will
work with the package. Also this is one of the rare occasions where requires was defined and had to look for a really popular package before I found one, not even requests defines them...

  1. A specific version query so obtain the package version/s compatible with desired Python version range
rez-pip -c (--constrain) package_name python_version_range

Here the classifiers are pretty much useless and we really need requires_python as python_version really only specifies py2 or py3 but no other details. The problem again is that
requires_python is hardly ever correctly defined....

I am starting to feel that this might be too immature to be implemented as a helper tool. I have good intentions but unfortunately, without formalization and uniformity the data is just not there to backup the claims that might be made from such a tool.

A bit sceptical on how to continue (hence experimental in title). It might be worth to extract the purity detection part only as that immediately adds value and work on the experimental stuff on a different PR.

Thoughts?

@lambdaclan lambdaclan changed the title [Feature] Pure python package detection and experimental quality of life improvements [Feature] Pure python package detection Jul 8, 2019
@nerdvegas nerdvegas mentioned this pull request Jul 10, 2019
1 task
@lambdaclan
Copy link
Contributor Author

Hello @nerdvegas

Thank you for merging #656! I have now merged the changes from master so only the relevant diffs are now visible, hope it is better now.

src/rez/pip.py Outdated
@@ -335,6 +335,7 @@ def pip_install_package(source_name, pip_version=None, python_version=None,
"""
installed_variants = []
skipped_variants = []
variant_types = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any references to this variable in the function. I think it's no more needed.

@JeanChristopheMorinPerso
Copy link
Member

JeanChristopheMorinPerso commented Jul 13, 2019

This PR looks good to me. Tested it and it works as expected.

Add package purity detection using Wheel format metadata as specified
here https://www.python.org/dev/peps/pep-0491/#what-s-the-deal-with-purelib-vs-platlib.
Add process output flag to run pip command in order to parse output
of the pip install result before displaying it to the user.
Add variant type to log pure / impure state.
Drop os version dependency all together
platform / architecture / (os) / python for all packages
pure and impure.
Only for pure packages - drop specific Python version  and use only
major part.

Relates #602, #61
py_ver = python_variant.version.trim(2)

if pure:
py_ver = python_variant.version.trim(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we shouldn't do this; or, that we do it but only if we know it's safe to do so.

"Safe" depends on whether there are markers in the requirements. For eg, if the package requires "foo-1.0 ; python-3.1.*", then the only way we can properly express the package's requirements in rez is if we variant on pthon-MAJOR.MINOR; otherwise we cannot specify foo as a requirement for just that case.

I also suspect that the complexity in determining this from the markers isn't worth it, and that we should probably just stick to varianting on python-MAJOR.MINOR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Allan, after discussing the issue with Thorsten we have reached an agreement that we can live without this particular change, trimming the minor version when dealing with pure packages. What is most important for us is the removal of the os as that adds a significant number of rebuilds when dealing with Windows versions (python x plat x arch x os) removing just one of these segments will make our work much much easier.

Also. as you mention above it makes sense to keep the minor version since it might be a hard requirement for some packages that will cause other issues. I would like to explore the markers solution down the line so I will take a look and see if there is a way to achieve it, getting rid of the minor version will still be a big improvement if it is reliable.

For now I will remove the minor version trimming so that we can continue with the review before the merge.

@nerdvegas
Copy link
Contributor

nerdvegas commented Jul 16, 2019 via email

@nerdvegas
Copy link
Contributor

I'm going to merge this now, been reviewing for a while and it sounds good. Will make that py ver trim change if it hasn't been done already. I also have a PR coming on the tails of this one. Cheers all!

@nerdvegas nerdvegas merged commit 186694b into AcademySoftwareFoundation:master Jul 16, 2019
@lambdaclan lambdaclan deleted the feature/pure-python-detection branch April 6, 2020 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants