Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 639: Implement License-Expression and License-File #828

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
de8b239
PEP 639: Implement License-Expression and License-File
ewdurbin Sep 3, 2024
44beda7
wire in vendoring library to manage vendored dependencies
ewdurbin Sep 3, 2024
2b1a9f7
vendor license_expression and boolean.py
ewdurbin Sep 3, 2024
9bfa714
harmonize with documentation
ewdurbin Sep 3, 2024
1aea212
exclude vendored spdx data from sdist/whl. build/bring our own
ewdurbin Sep 4, 2024
cec33f1
migrate to parser based on hatchling's
ewdurbin Sep 4, 2024
cfb3af1
string -> re
ewdurbin Sep 4, 2024
d6b47d5
License-File: disallow unresolved globs and non-relative paths
ewdurbin Sep 5, 2024
a55f422
Merge branch 'main' into pep_639
brettcannon Sep 11, 2024
afa5d4c
Apply suggestions from code review
ewdurbin Sep 13, 2024
396e4ef
update typing for licenses.spdx
ewdurbin Sep 13, 2024
21a2821
Extend typing improvements in licenses.spdx to include Exception
ewdurbin Sep 13, 2024
4ac18f0
fixup names, Exception is not a good one.
ewdurbin Sep 13, 2024
46a7491
better enforcement of license-file paths
ewdurbin Sep 13, 2024
cd7105f
subclass ValueError for invalid license expressions
ewdurbin Sep 13, 2024
e469b7e
and empty license expression is invalid
ewdurbin Sep 13, 2024
e699391
create a "NormalizedLicenseExpression" type
ewdurbin Sep 13, 2024
22fa9cd
rename normalize -> canonicalize
ewdurbin Sep 13, 2024
f952ab9
add tests to ensure license and exception ids conform
ewdurbin Sep 13, 2024
30e34f1
update name of var
ewdurbin Sep 13, 2024
8906b16
match formatting standards
ewdurbin Sep 13, 2024
a361294
reorganize the licenses module a bit
ewdurbin Sep 15, 2024
9cee38e
apply code-review suggestions for update_licenses task
ewdurbin Sep 15, 2024
81efbda
fix tests after spdx module was made private
ewdurbin Sep 15, 2024
701217b
add docs
ewdurbin Sep 15, 2024
4539543
Merge branch 'main' into pep_639
brettcannon Sep 16, 2024
64d3647
add additional test cases and handle LicenseRef- identifiers not alre…
ewdurbin Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The ``packaging`` library uses calendar-based versioning (``YY.N``).
version
specifiers
markers
licenses
requirements
metadata
tags
Expand Down
53 changes: 53 additions & 0 deletions docs/licenses.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Licenses
=========

.. currentmodule:: packaging.licenses


Helper for canonicalizing SPDX
`License-Expression metadata <https://peps.python.org/pep-0639/#term-license-expression>`__
as `defined in PEP 639 <https://peps.python.org/pep-0639/#spdx>`__.


Reference
---------

.. class:: NormalizedLicenseExpression

A :class:`typing.NewType` of :class:`str`, representing a normalized
License-Expression.


.. exception:: InvalidLicenseExpression

Raised when a License-Expression is invalid.


.. function:: canonicalize_license_expression(raw_license_expression)

This function takes a valid Python package or extra name, and returns the
normalized form of it.

The return type is typed as :class:`NormalizedLicenseExpression`. This allows type
checkers to help require that a string has passed through this function
before use.

:param str raw_license_expression: The License-Expression to canonicalize.
:raises InvalidLicenseExpression: If the License-Expression is invalid due to and
invalid/unknown license identifier or invalid syntax.

.. doctest::

>>> from packaging.licenses import canonicalize_license_expression
>>> canonicalize_license_expression("mit")
'MIT'
>>> canonicalize_license_expression("mit and (apache-2.0 or bsd-2-clause)")
'MIT AND (Apache-2.0 OR BSD-2-Clause)'
>>> canonicalize_license_expression("(mit")
Traceback (most recent call last):
...
InvalidLicenseExpression: Invalid license expression: '(mit'
>>> canonicalize_license_expression("Use-it-after-midnight")
Traceback (most recent call last):
...
InvalidLicenseExpression: Unknown license: 'Use-it-after-midnight'
6 changes: 6 additions & 0 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,12 @@ def release(session):
webbrowser.open("https://github.com/pypa/packaging/releases")


@nox.session
def update_licenses(session: nox.Session) -> None:
session.install("httpx")
session.run("python", "tasks/licenses.py")


# -----------------------------------------------------------------------------
# Helpers
# -----------------------------------------------------------------------------
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,11 @@ warn_unused_ignores = true
module = ["_manylinux"]
ignore_missing_imports = true


[tool.ruff]
src = ["src"]
extend-exclude = [
"src/packaging/licenses/_spdx.py"
]

[tool.ruff.lint]
extend-select = [
Expand Down
142 changes: 142 additions & 0 deletions src/packaging/licenses/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
#######################################################################################
#
# Adapted from:
# https://github.com/pypa/hatch/blob/5352e44/backend/src/hatchling/licenses/parse.py
#
# MIT License
#
# Copyright (c) 2017-present Ofek Lev <oss@ofek.dev>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this
# software and associated documentation files (the "Software"), to deal in the Software
# without restriction, including without limitation the rights to use, copy, modify,
# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be included in all copies
# or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
# CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
# OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
#
# With additional allowance of arbitrary `LicenseRef-` identifiers, not just
# `LicenseRef-Public-Domain` and `LicenseRef-Proprietary`.
#
#######################################################################################
from __future__ import annotations

import re
from typing import NewType

from packaging.licenses._spdx import EXCEPTIONS, LICENSES

__all__ = [
"NormalizedLicenseExpression",
"InvalidLicenseExpression",
"canonicalize_license_expression",
]

license_ref_allowed = re.compile("^[A-Za-z0-9.-]*$")

NormalizedLicenseExpression = NewType("NormalizedLicenseExpression", str)
ewdurbin marked this conversation as resolved.
Show resolved Hide resolved


class InvalidLicenseExpression(ValueError):
"""Raised when a license-expression string

>>> canonicalize_license_expression("invalid")
Traceback (most recent call last):
...
packaging.licenses.InvalidLicenseExpression: Invalid license expression: 'invalid'
"""


def canonicalize_license_expression(
raw_license_expression: str,
) -> str | NormalizedLicenseExpression:
if raw_license_expression == "":
message = f"Invalid license expression: {raw_license_expression!r}"
raise InvalidLicenseExpression(message)

# Pad any parentheses so tokenization can be achieved by merely splitting on
# white space.
license_expression = raw_license_expression.replace("(", " ( ").replace(")", " ) ")

license_refs = {
ref.lower(): "LicenseRef-" + ref[11:]
for ref in license_expression.split()
if ref.lower().startswith("licenseref-")
}

# Normalize to lower case so we can look up licenses/exceptions
# and so boolean operators are Python-compatible.
license_expression = license_expression.lower()

tokens = license_expression.split()

# Rather than implementing boolean logic, we create an expression that Python can
# parse. Everything that is not involved with the grammar itself is treated as
# `False` and the expression should evaluate as such.
python_tokens = []
for token in tokens:
if token not in {"or", "and", "with", "(", ")"}:
python_tokens.append("False")
elif token == "with":
python_tokens.append("or")
elif token == "(" and python_tokens and python_tokens[-1] not in {"or", "and"}:
message = f"Invalid license expression: {raw_license_expression!r}"
raise InvalidLicenseExpression(message)
else:
python_tokens.append(token)

python_expression = " ".join(python_tokens)
try:
result = eval(python_expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could globals and locals be passed in here? That feels safer, regardless if it is or not.

except Exception:
result = True
ewdurbin marked this conversation as resolved.
Show resolved Hide resolved

if result is not False:
message = f"Invalid license expression: {raw_license_expression!r}"
raise InvalidLicenseExpression(message) from None

# Take a final pass to check for unknown licenses/exceptions.
normalized_tokens = []
for token in tokens:
if token in {"or", "and", "with", "(", ")"}:
normalized_tokens.append(token.upper())
continue

if normalized_tokens and normalized_tokens[-1] == "WITH":
Comment on lines +113 to +115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
continue
if normalized_tokens and normalized_tokens[-1] == "WITH":
continue
elif normalized_tokens and normalized_tokens[-1] == "WITH":

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against this one, this repository doesn't appear to have many stylistic rules enabled but in Hatch (where this came from) I enable the one where control flow statements require no elif. In this example, the continue is the statement so the following branch should start anew.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it isn't necessary, but I have been bit by people not noticing it wasn't an elif when changing something like this to not short-circuit the overall if block.

Anyway, I'm guessing you care because you want to copy the code back into Hatch?

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to use whatever lands here and get rid of my custom code now that it's upstreamed. I just wanted maintainers to be aware that when you start using Ruff more it will catch this line, and I think for good reason. Feel free to do whatever you think is best!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of the simpler structure; the el isn't necessary, and adds extra complexity (especially when nested). Changing a control flow statement into a non-control flow statement should change control flow, trying to proactively protect against it isn't helpful, IMO.

And this is called a "guard pattern" - it's even part of the syntax in Ruby. (It would be continue if normalized_tokens and normalized_tokens[-1] == "WITH" if Python had Ruby's guard, btw).

if token not in EXCEPTIONS:
message = f"Unknown license exception: {token!r}"
raise InvalidLicenseExpression(message)

normalized_tokens.append(EXCEPTIONS[token]["id"])
else:
if token.endswith("+"):
final_token = token[:-1]
suffix = "+"
else:
final_token = token
suffix = ""

if final_token.startswith("licenseref-"):
if not license_ref_allowed.match(final_token):
message = f"Invalid licenseref: {final_token!r}"
raise InvalidLicenseExpression(message)
normalized_tokens.append(license_refs[final_token] + suffix)
else:
if final_token not in LICENSES:
message = f"Unknown license: {final_token!r}"
raise InvalidLicenseExpression(message)
normalized_tokens.append(LICENSES[final_token]["id"] + suffix)

normalized_expression = " ".join(normalized_tokens)

return normalized_expression.replace("( ", "(").replace(" )", ")")
Loading
Loading