Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array namespace #685

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Array namespace #685

wants to merge 3 commits into from

Conversation

nstarman
Copy link
Contributor

@nstarman nstarman commented Sep 12, 2023

Proof of concept.

Benefits:

  1. Proper type hint of return type of __array_namespace__.
  2. arange is now a statically understood signature, e.g. func: arange is meaningful.
  3. can be easily extended to encompass all functions in this repo.

Fixes #267

Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nstarman. The protocol for the __array_namespace__ return value sounds good.

Can you explain the arange change? Is it only because the return type is otherwise difficult for static type checkers to understand because there's no input array? Documentation wise it doesn't look good to me to change a function to a class - we should avoid changing how it's rendered in the html docs.

from .creation_functions import arange as ArangeCallable


class ArrayAPINamespace(Protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The camelcase works out slightly unfortunately here, but I think it's fine. ArrayApiNamespace or ArrayAPI_Namespace are more readable but not consistent to the standard rule.

Copy link
Contributor Author

@nstarman nstarman Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Alternatively, we could drop the API bit in the middle?
Then this looks good: __array_namespace__() -> ArrayNamespace: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could drop the API bit in the middle?

That sounds like the best idea here to me.

@nstarman
Copy link
Contributor Author

nstarman commented Sep 12, 2023

The change in arange is because of callback protocols (https://peps.python.org/pep-0544/#callback-protocols), allowing for static comparison of the signature. As this relates to __array_namespace__, this means we can type hint ArrayAPINamespace.arange without having to copy the whole function signature, which is liable to become out of sync with the actual definition, e.g.

class ArrayAPINamespace(Protocol):
    arange: ArrangeCallable

Vs

class ArrayAPINamespace(Protocol):

    @staticmethod
    def arange(start: Union[int, float], /, stop: Optional[Union[int, float]] = None, step: Union[int, float] = 1, *, dtype: Optional[dtype] = None, device: Optional[device] = None) → array):
        ...

Per the docs rendering, class types can be manipulated. At https://github.com/cosmology-api/cosmology.api/tree/3f7ae746a166201298ebd8a786249b58aefdfe3d/docs/_ext we have some deep foo doing signature manipulations (@ntessore wrote this). So it's doable to have arange be a Protocol but be rendered as a function.

Also, callback protocols means the following is possible

>>> from numpy import arange
>>> from data_apis.array import arange as ArangeCallable

>>> isinstance(arange, ArangeCallable)
True

>>> def mycustomarange(...<correct signature>): ...
>>> isinstance(mycustomarange, ArangeCallable)
>>> True

@rgommers
Copy link
Member

Hmm, I am not knowledgeable enough about the latest in static typing to understand the callback rationale here. I do see that in NumPy arange is typed as a regular function with overloads.

@BvB93 do you have an opinion about typing arange this way?

@BvB93
Copy link
Contributor

BvB93 commented Sep 13, 2023

It's (unfortunately) more contrived than ideal, but the typing itself is perfectly solid.

Just to give a bit more background:
It's long been possible to create an object (well... a declaration thereof) using a type, e.g. foo: Callable[..., int] = blablabla. Unfortunatently it's not possible to go the other way around and create a type from an existing object. Without something like a python equivalent of Typescript's typeof operator (or some other way of reusing def statements for annotating namespaces), we're stuck with __call__-based protocol approach as used in this PR.

@rgommers
Copy link
Member

Thanks! And to confirm if I understood it right: we're going to need this method for all functions that don't take an array as an input (mostly array creation functions), and not for anything else, right? Or would we have to change all functions to classes with __call__ methods?

@BvB93
Copy link
Contributor

BvB93 commented Sep 13, 2023

Or would we have to change all functions to classes with call methods?

The latter I'm afraid, or at least for all functions that you'd like to include in the array namespace protocol. At best you might be able to reuse a single protocol multiple times for representing for different functions with identical signatures, but I imagine that this could cause issues with docstrings and such.

@rgommers
Copy link
Member

Hmm. I wonder if it would be feasible to codegen type stubs here? If for every function def func(...) in a .py file we'd generate a corresponding entry class func: def __call__(self, ...) in a .pyi file, we could keep things normal in .py files while still making Mypy & co happy.

@nstarman
Copy link
Contributor Author

nstarman commented Sep 13, 2023

I would suggest to do the reverse, to code gen the function from the class and use the code-gen'ed function in the docs. The call-back protocols are useful for type-checking: both statically and at runtime (see #685 (comment) for examples). My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class). Avoiding magic and meta-coding in the actual library will help in achieving that goal. For communication purposes in the docs I agree that a function is the best representation, hence code-gen the function for the docs, and the __call__-protocol for the code.

@BvB93
Copy link
Contributor

BvB93 commented Sep 13, 2023

My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class).

Once we have some parser for either automatically going from def to class or vice versa then it shouldn't really matter which direction we go (outside of whatever is considered more aestheticly pleasing during development), no? When actually building the package we could automatically generate perform this conversion either way.

@nstarman
Copy link
Contributor Author

I think it ends up mattering in a few cases:

  • installing a dev branch by installing the repo by pop install -e .
  • type checking with CI, e.g. mypy in pre-commit
  • using an IDE when developing the library

Composing that list, I guess it's the same for anyone using a compiled wheel, but for developers close to the code having to code-gen the type-correct classes from functions will be a pain and make the code effectively equivalent to being written in a compiled language a la https://xkcd.com/303.

@rgommers
Copy link
Member

Composing that list, I guess it's the same for anyone using a compiled wheel,

I'm not too worried about the users of the installed package, I'm thinking more from the point of view of working on the standard itself. In the end that is the primary purpose here: authoring a well-documented API standard. These things are functions in the standard after all, so having it all converted to classes with __call__ methods solely because static typing in Python is so limited makes it harder to work on the standard. I don't expect many contributors are experts in static typing rules, so it'll raise a few eyebrows I think when anyone sees these class definitions for the first time.

Also from a code reusability point of view: the current functions are idiomatic. You can copy them and fill in the body of each function in order to get a standard-compliant implementation.

installing a dev branch by installing the repo by pop install -e .

This one is easy to fix, as is an in-place build when using an IDE (those are the same effectively) - you can run the codegen as part of the install.

type checking with CI, e.g. mypy in pre-commit

I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi files in-tree though before running Mypy.

@kgryte
Copy link
Contributor

kgryte commented Sep 14, 2023

I second @rgommers opinion that we should continue authoring as functions, rather than as classes. Authoring as classes increases authoring complexity and just raises contribution barriers. I'd prefer to hide this complexity behind automation from functions to class conversion.

@BvB93
Copy link
Contributor

BvB93 commented Sep 14, 2023

I did some trial and error yesterday and managed to cobble a script together for automatically carrying out this def ... -> class ... conversion. Turns out it's not all too complicated, though I did let black and isort handle the final formatting of the file as doing it in a more manual fashion with ast sounds like a nightmare: https://gist.github.com/BvB93/b659c9145cde08eb338053d7533306fb.

I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi files in-tree though before running Mypy.

The biggest obstacle would probably be a setuptools >=64 regression that breaks the type checking of editable installations. There are workarounds for this though: python/mypy#13392 (comment).

@rgommers
Copy link
Member

@BvB93 thanks for pointing out that editable install issue. I'll note that that will also affect numpy and any other users of meson-python (and scikit-build-core too). Import hooks are a must-have for out of tree builds. I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).

@BvB93
Copy link
Contributor

BvB93 commented Sep 21, 2023

I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).

Right, I did a little bit more reading and some trial & error; I think we might be in the clear here: the relevant issue seems to only apply when (a package A is installed in editable mode and (b package B imports from package A, with B now unable to see A's annotations. So this could be a potential downstream annoyance for packages that want access to the array api's annotation, but it shouldn't affect the array api repo itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Static Typing Static typing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__array_namespace__ type Hint
4 participants