Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to Python 3.9, Big typing updates #2963

Merged
merged 16 commits into from
Nov 4, 2024
Merged

Bump to Python 3.9, Big typing updates #2963

merged 16 commits into from
Nov 4, 2024

Conversation

ashleysommer
Copy link
Contributor

@ashleysommer ashleysommer commented Oct 31, 2024

This is a very large PR.
This represents the first steps in the path to RDFLib 8.

There is a bunch to unpack in this one, this is the summary:

  1. Bump minimum Python version to v3.9, because Python 3.8 is no longer supported.
  2. Added Python 3.12 to supported list, and added 3.12 to tox and github actions test suite.
  3. Update any/all dependencies that were stuck on older versions due to our support of Python 3.8. This allowed some dependencies and subdependencies to update their minimum versions to their post-3.8 versions.
  4. Updated MyPy to the latest version (v1.13), to get latest Type Checking advancements, and changed its rules to Python 3.9-compatible rules.
  5. Updated Ruff to latest version for new Linting (with automatic fixing!) and changed its rules to Python 3.9-compatible rules.
  6. Bumped Black to latest version (and pinned it, because we always keep Black pinned).
  7. Updated typing annotations in the codebase from pre-3.9 to post-3.9 style
  8. Re-run Ruff to fix new linting errors, re-run Black to fix new formatting infractions.

It is step number 7 that makes this PR so huge.

As an example, pre-3.9 typing annotations look like this:

from typing import Dict, Tuple, Union

my_var: Dict[str, Tuple[Union[str, bytes]]] = {}

and preferred type annotation style used in Python 3.9+ is like this:

my_var: dict[str, tuple[str|bytes]]

The main differences being that importing basic type classes from the typing stdlib module is deprecated in Python 3.9+, for things like List, Set, Dict, Tuple, Python 3.9+ supports using the native classes directly as type annotations list, set, dict, tuple, etc. The other obvious change is the move away fro the Union helper from typing module, because you can simply union types with the bar | operator in Python 3.9+.

Applying these annotation-pattern updates across the whole codebase caused hundreds of files to be modified in this PR.

Applying all these changes along with the bump to latest MyPy and asking MyPy type checker to operate in Python 3.9+ compatibly mode caused some fallout. There were over 250 failing type checks after applying the above changes, that each needed to be investigated and fixed.

The main culprit was RDFLib's existing excessive use of typing: ignore[] comments, that were either no longer needed (eg, the type was fixed, and the ignore was redundant which itself is a typing error), or causing their own typing problems. Most typing: ignore[] comments were removed, and where there were still type errors the underlying cause was fixed instead of commented over.
Another big contributor to the failures were some incompatibilities with RDFLib's fundamental base types, that were defined like this:

_SubjectType: TypeAlias = rdflib.term.Node
_PredicateType: TypeAlias = rdflib.term.Node
_ObjectType: TypeAlias = rdflib.term.Node
_LiteralType: TypeAlias = Tuple[_SubjectType, _PredicateType, _ObjectType]

The problem is Node is the base classes for many different objects in RDFLib, some of which should not be used as subject or object in a Triple, and most of which cannot be used as the predicate in a triple.

So this PR tightens up the typing of _SubjectType, _PredicateType, and _ObjectType to enforce correct usage of various python objects in triples, at the typing level. In various locations in the code baseinstead of simply using Node or Identifier annotations for objects in a graph (or serializer or parser), we now use the correct _SubjectType, _PredicateType, _ObjectType where appropriate. This change caused a further 200+ type errors to be emitted from MyPy, all of which needed to be carefully checked and fixed. This exposed many instances where incorrect object types were being used, or edge-cases not considered and dealt with.

The remaining portions of this PR are the fixes for those errors. The test suite was used continually throughout the process to ensure the actual behaviour of the executing RDFLib code did not change, and the test suite still passes.

All files were reformatted with latest Black, and Ruff linter with auto-fix used to ensure everything is compliant.

Update dependencies minimum package versions to post-3.9 versions.
Upgrade latest pyparsing
Update to latest MyPy
Update to latest Black
Re-issue new Poetry lockfile
@coveralls
Copy link

coveralls commented Nov 1, 2024

Coverage Status

coverage: 90.248% (-0.03%) from 90.28%
when pulling 73400f3 on py39_typing
into 4be4216 on main.

@ashleysommer
Copy link
Contributor Author

Finally after a bunch of extra tweaks to satisfy tests, linters, and doc generation, this is ready to go!

@edmondchuc
Copy link
Contributor

Nice one! I'm definitely looking forward to the type hint improvements around Graph.

Just out of curiosity, I always thought the union type operator | was introduced in python 3.10. How are they interpreted in 3.9 and not raising a syntax error? My guess is they're not interpreted at runtime or something.

@ashleysommer
Copy link
Contributor Author

ashleysommer commented Nov 2, 2024

Hi @edmondchuc
I just double checked, and you're absolutely right. I was basing my changes on the "whats new in Python 3.9" release notes, to see which new language features I was able to take advantage in this change. I saw the "typing using generics" and "union operator on dicts", and I suppose I confused the two and thought it mentioned | operator on types.

I noticed when making these changes, Python 3.9 only allows the | operator in type annotations, which are not actually read by the Python interpreter. I think the annotations are only read by MyPy, and it will accept that operator even when running MyPy with the "--python-version 3.9" parameter.

Eg, this works:

my_str_or_bytes: str | bytes = b"hello"

But this does not:

StrOrBytesType: TypeAlias = str | bytes  # <-- Python interpreter error in v3.9
my_str_or_bytes: StrOrBytesType = b"hello"

Because the former is only ready by MyPy, but the latter is evaluated by the interpreter that fails in Python 3.9. I believe that is what has changed in Python 3.10.

That is why you will still see code like this in this PR:

_SubjectType: te.TypeAlias = Union[IdentifiedNode, Literal, Variable]

@ashleysommer
Copy link
Contributor Author

@edmondchuc Do you think we should merge this as-is, or move back to Union[] typing?

@edmondchuc
Copy link
Contributor

Hi Ashley, thanks for the explanation. I think you're right. The union operator works for type annotations but not for when they are interpreted as values.

In addition to what you've stated, the reason the | works in type annotations is because of the presence of the following import.

from __future__ import annotations

Just tested this on python 3.9. Without the above import, it raises a TypeError.

(p39-test) edmondchuc@192-168-1-7 p39-test % python main.py
Traceback (most recent call last):
  File "/Users/edmondchuc/test/p39-test/main.py", line 4, in <module>
    def foo(value: str | None = None):
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I think it's fine to merge as-is.

I also think that given this PR is moving towards newer typing syntax, it makes sense to replace the use of Optional with the newer union syntax too.

- from typing import Optional

- def foo(value: Optional[str] = None): ...
+ def foo(value: str | None = None): ...

What do you think?

@ashleysommer
Copy link
Contributor Author

ashleysommer commented Nov 4, 2024

@edmondchuc
Regarding Optional vs |None, the reason that change wasn't made in this PR was because it was introduced for Python 3.10+ (and remember I thought replacing Union[] was for 3.9+ and I thought I was making only 3.9+ compatible changes). I agree that if we are replacing Union[] and they are both actually 3.10+, and both actually work in Python 3.9 in Annotations, then we can certainly make that change in a subsequent PR.

@edmondchuc
Copy link
Contributor

Great, a subsequent PR sounds good.

@ashleysommer ashleysommer merged commit 086a4f7 into main Nov 4, 2024
22 checks passed
@ashleysommer ashleysommer deleted the py39_typing branch November 4, 2024 04:13
@edmondchuc edmondchuc mentioned this pull request Nov 5, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants