Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of attribution and licence information for code derived from CPython #33

Closed
emilazy opened this issue Jul 10, 2024 · 7 comments
Closed

Comments

@emilazy
Copy link

emilazy commented Jul 10, 2024

The following code was introduced to nose/importer.py in b524756:

pynose/nose/importer.py

Lines 21 to 125 in cc86546

SEARCH_ERROR = 0
PY_SOURCE = 1
PY_COMPILED = 2
C_EXTENSION = 3
PY_RESOURCE = 4
PKG_DIRECTORY = 5
C_BUILTIN = 6
PY_FROZEN = 7
PY_CODERESOURCE = 8
IMP_HOOK = 9
def get_suffixes():
extensions = [
(s, 'rb', C_EXTENSION) for s in importlib.machinery.EXTENSION_SUFFIXES
]
source = [
(s, 'r', PY_SOURCE) for s in importlib.machinery.SOURCE_SUFFIXES
]
bytecode = [
(s, 'rb', PY_COMPILED) for s in importlib.machinery.BYTECODE_SUFFIXES
]
return extensions + source + bytecode
def init_builtin(name):
try:
return _builtin_from_name(name)
except ImportError:
return None
def load_package(name, path):
if os.path.isdir(path):
extensions = (
importlib.machinery.SOURCE_SUFFIXES[:]
+ importlib.machinery.BYTECODE_SUFFIXES[:]
)
for extension in extensions:
init_path = os.path.join(path, '__init__' + extension)
if os.path.exists(init_path):
path = init_path
break
else:
raise ValueError('{!r} is not a package'.format(path))
spec = importlib.util.spec_from_file_location(
name, path, submodule_search_locations=[]
)
sys.modules[name] = importlib.util.module_from_spec(spec)
spec.loader.exec_module(sys.modules[name])
return sys.modules[name]
def find_module(name, path=None):
"""Search for a module.
If path is omitted or None, search for a built-in, frozen or special
module and continue search in sys.path. The module name cannot
contain '.'; to search for a submodule of a package, pass the
submodule name and the package's __path__."""
if is_builtin(name):
return None, None, ('', '', C_BUILTIN)
elif is_frozen(name):
return None, None, ('', '', PY_FROZEN)
# find_spec(fullname, path=None, target=None)
spec = importlib.machinery.PathFinder().find_spec(
fullname=name, path=path
)
if spec is None:
raise ImportError(_ERR_MSG.format(name), name=name)
# RETURN (file, file_path, desc=(suffix, mode, type_))
if os.path.splitext(os.path.basename(spec.origin))[0] == '__init__':
return None, os.path.dirname(spec.origin), ('', '', PKG_DIRECTORY)
for suffix, mode, type_ in get_suffixes():
if spec.origin.endswith(suffix):
break
else:
suffix = '.py'
mode = 'r'
type_ = PY_SOURCE
encoding = None
if 'b' not in mode:
with open(spec.origin, 'rb') as file:
encoding = tokenize.detect_encoding(file.readline)[0]
file = open(spec.origin, mode, encoding=encoding)
return file, spec.origin, (suffix, mode, type_)
def load_module(name, file, filename, details):
"""Load a module, given information returned by find_module().
The module name must include the full package name, if any."""
suffix, mode, type_ = details
if type_ == PKG_DIRECTORY:
return load_package(name, filename)
elif type_ == C_BUILTIN:
return init_builtin(name)
elif type_ == PY_FROZEN:
return init_frozen(name)
spec = importlib.util.spec_from_file_location(name, filename)
mod = importlib.util.module_from_spec(spec)
sys.modules[name] = mod
spec.loader.exec_module(mod)
return mod

This code is clearly a derivative work of the since‐removed CPython Lib/imp.py file, with most functions and documentation being clearly based on the CPython code, some with no changes at all.

The original code is copyrighted by the Python Software Foundation, and released under the terms of the Python Software Foundation License Version 2. Derivative works are permitted, and there is no obstacle to including such a derivative work in a larger work licensed under the LGPL, but there are conditions; here is a relevant excerpt:

2. Subject to the terms and conditions of this License Agreement, PSF hereby
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation;
All Rights Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.

By my reading, the following requirements to distribute a derivative work of this CPython code were not met:

  1. inclusion of the LICENSE text, either directly in the relevant file or elsewhere in the source repository;

  2. inclusion of the PSF’s notice of copyright, i.e. Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved, with the code that is a derivative work of code the PSF owns;

  3. inclusion of a brief summary of the changes made to the original code, as it is based on/incorporating a part of CPython.

These need to be corrected for legal compliance with the licence granted by the copyright holders of the code this derivative work was based on. However, there is unfortunately an additional potential complication:

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

Unlike most modern licences, this does not include any grace period or clause to restore the licence if the breach is corrected. If your licence to prepare and distribute derivative works has been terminated due to non‐compliance with the terms, then the code in nose/importer.py would constitute an illegal infringement of copyright.

In this case, however, I am sure that if the breach was accidental and you come into compliance with the licence requirements, the PSF would have no reason to stop you using the code; I would recommend addressing these compliance issues promptly and then contacting the Python Software Foundation at psf@python.org to inform them of this issue and that you have addressed it upon being notified, and ask them to either confirm that they do not consider this to have constituted a material breach of the licence terms or, if they do consider you to have materially breached them, to license you to use the CPython code again under the same terms.

@mdmintz
Copy link
Owner

mdmintz commented Jul 10, 2024

Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)

Adapted from the CPython 3.11 imp.py code.
Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved
Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.

Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py)
Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a BSD 3-Clause License, which sounds different from CPython's LGPL 2.1 License.

So from my pdbpp example above, here are the big questions I have:

  • How is it that pdbpp is OK, whereas pynose isn't OK with respect to how things were handled?
  • If what pdbpp is OK, then what does pynose need to do to in order to modify CPython code without the License issue?

pdbpp is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents)
I just want to be sure that pynose isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.

Also separate from this issue is my pdbp fork to fix pdbpp. Was broken on Windows (pdbpp/pdbpp#498). Also broken for pytest (pdbpp/pdbpp#519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):

pdbp comes to save the day

Back to the original topic, it appears that my pynose code has already been widely used in places such as Alpine Linux:

That also means that my code can be found in Azure, AWS, Google Cloud, and Docker:

I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.


I'll be waiting for a response to the two questions I posted earlier in this message in regards to pdbpp and pynose.

@jchv
Copy link

jchv commented Jul 10, 2024

Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)

Adapted from the CPython 3.11 imp.py code.
Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved
Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.

Basically, yes.

Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py) Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a BSD 3-Clause License, which sounds different from CPython's LGPL 2.1 License.

Nit: CPython is mostly PSLv2 licensed, and the pdb.py library doesn't appear to have a separate license unless I missed it.

So from my pdbpp example above, here are the big questions I have:

  • How is it that pdbpp is OK, whereas pynose isn't OK with respect to how things were handled?

We don't package pdbpp in NixOS, so it's a bit out of scope for us. However it appears pdbpp is out of compliance with copyright licenses it is beholden to. This should be reported upstream. If the maintainers are working in good faith then I'd hope they would be willing to fix this. It shouldn't impact much since the PSL2 is a permissive license anyways.

  • If what pdbpp is OK, then what does pynose need to do to in order to modify CPython code without the License issue?

I'm not sure what this means, but what pdbpp is doing does not appear to be OK.

pdbpp is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents) I just want to be sure that pynose isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.

Also separate from this issue is my pdbp fork to fix pdbpp. Was broken on Windows (pdbpp/pdbpp#498). Also broken for pytest (pdbpp/pdbpp#519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):

For what it's worth, just because something is popular does not mean it does not have licensing issues, or that the licensing issues can be ignored. There was a huge explosion with Ruby on Rails not that long ago due to licensing issues.

Unfortunately it's not possible to always catch these issues. A lot of smaller community projects are not fully-compliant with copyright licenses in small ways, e.g. Apache 2 technically requires that every file individually has a copyright disclaimer IIRC, but many projects don't do this. There's definitely differing levels of severity though, and "fails to disclose copyright holders and licensing obligations" is higher than most of the clerical errors.

Back to the original topic, it appears that my pynose code has already been widely used in places such as Alpine Linux:

I hope you understand though, that this in and of itself does not actually provide any meaningful assurance that your project is actually complying with its legal obligations. Even with larger organizations who have much greater auditing standards, they mostly rely on automated scanning to detect licensing issues, but they can only do this if the projects are actually annotated properly; the tooling can't detect if code is copied from an undisclosed author with an undisclosed license.

If pynose is not updated to comply with its legal obligations, all of these downstream users will need to be informed and will probably have to find another contingency plan.

That also means that my code can be found in Azure, AWS, Google Cloud, and Docker:

I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.

Congratulations, but I'm not really sure what this has to do with the issue other than it means there's a whole lot of people who are going to have additional copyright license auditing work.


Anyway, I hope you realize that we're not just here to be part of some weird schadenfreude hate brigade, but rather just flagging the issue because we noticed it. Here's the timeline of what happened:

  • I began working on my own patches to make nose work on Python 3.12. I copied code from imp.py from Python 3.11 and adapted it into nose/importer.py. I added a note that I did this in the code (though later on it was improved, since initially it still was not fully compliant with the PSL2 terms.)

  • At some point I figured out via searching GitHub that Alpine had already done this, and decided it'd be easier to just pull from Alpine Linux.

  • It was pointed out to me that the licensing situation with Alpine Aports was unclear. We were not sure exactly what to do in this moment.

  • We decided that most of the changes were trivial enough that they may not, on their own, be eligible for copyright protection, but the blob in nose/importer.py was. Of course, I realized immediately that it was just the PSL2-licensed code from CPython's standard library, and suggested that it would be okay.

  • It was finally noticed that it was actually your patch, and this issue was created shortly after.

That's the full story.

To be honest, I would've preferred to just use pynose because it saves me/us the effort of trying to find each way that nose breaks on Python 3.12, so I find this whole thing very unfortunate.

The legal obligations that you have are pretty clearly outlined in the LICENSE files of the projects that you copied from, and while there are some gray area bits (e.g. the use of Git to carry authorship information is somewhat commonpractice even though it possibly makes GitHub tarball distributions a violation of some license terms) this is not one: if you copy code from somewhere it needs proper attribution and licensing. Obviously nobody can force you to adhere to it, but no amount of counterexamples of other projects violating license terms or companies that inherit that license term violation will change the underlying facts.


Also, in case it is not evident, I am not a lawyer and do not mean to construe any of this text as legal advice. It is just my understanding of the situation as a layperson.

@emilazy
Copy link
Author

emilazy commented Jul 10, 2024

It depends on whether you intend to include the full licence text. If this notice on its own was written independently before this issue was raised and the licence text was not included, it would at least be much less likely to amount to a material breach, as it makes an effort to credit the copyright holder of the code and refers to the licence. However, now that you are aware of the exact requirements of the licence you would certainly be expected to come into full compliance with it.

So I would say that it is likely acceptable as long as you retain the full licence text in your source repository and any packaged distributions. The PSLv2 wording “provided, however, that PSF's License Agreement and […] are retained […] in any derivative version” is less vague about mechanism than the LGPL’s “give any other recipients of the Program a copy of this License along with the Program”; since you took code from CPython to make a derivative work from, you would be expected to ensure that the full licence text from CPython is also kept. Copying CPython’s LICENSE file into your repository (you could rename it to LICENSE.cpython to make it clear that it’s not the licence of the bulk of the code) and referencing it in the notice attached to the derived code would satisfy this obligation.

I would recommend keeping the copyright notice as the verbatim Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved specified by the licence text in order to match its requirements. However, the collapse of the year ranges is a trivial difference, so it may be considered acceptable by the Python Software Foundation’s lawyers. I would recommend getting in touch with them as it is possible that you will need them to grant you a replacement licence anyway due to the clause I mentioned in my original report.

About pdb++: to clarify, CPython is under the PSLv2 licence, not LGPL 2.1. As PSLv2 is a permissive licence it is acceptable to include derivative works of portions of it in LGPL 2.1 works, but the licences are quite different. In particular, it is also okay to include derivative works of CPython code in BSD‐licensed projects.

The specific function you linked may be too simple to fall under copyright protection, but if there is more copying along those lines, given that I can’t find any visible attribution of the PSF’s copyright or a copy of the PSLv2 licence, my answer is that it wouldn’t be okay for pdb++ either.

The only reason I’m reporting these issues with pynose and not pdb++ is because pynose came up in my work on NixOS. I haven’t looked at pdb++ because we don’t ship that code in any form, so its licence compliance is of no concern to us; there’s no singling out here beyond what concerns are applicable to us as a downstream distribution and raised to our attention. I would recommend you contact the pdb++ upstream and/or the Python Software Foundation if you have worries about their use of CPython code.

I expect that, like NixOS, Alpine was not aware of the use of CPython code when incorporating this patch. I’ve let Alpine know about this report so they can follow along and keep updated on the progress:

@mdmintz
Copy link
Owner

mdmintz commented Jul 10, 2024

A pull request is now available: #34

@mdmintz
Copy link
Owner

mdmintz commented Jul 10, 2024

The PR has been merged! Thank you @emilazy and @jchv for assisting!

@jchv
Copy link

jchv commented Jul 10, 2024

Great, I'm glad this could be resolved in a quick and amicable manner. I think downstreams can now have pretty good confidence there are no serious license/copyright issues here.

Thanks!

@Kangie
Copy link

Kangie commented Jul 10, 2024

Good work everyone. This ticket can probably be closed off. I've raised the upstream compliance issue with PSF and the upstream. I don't have much hope of an upstream resolution given the 3+ years since commits but we tried.

@mdmintz to allay any fears of non compliance you may want to independently reach out to the PSF legal guys and get their opinion. I don't think that anybody would argue that you haven't made a good-faith attempt to adhere to the T&Cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants