Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Added type stub generation for dynamic functions #478

Merged
merged 8 commits into from
Oct 18, 2023

Conversation

mbway
Copy link
Contributor

@mbway mbway commented Oct 10, 2023

Description

From the discussion in Fixes: #476, I have added a script to automatically generate type stubs for geoalchemy2.functions


I made quite a few false starts but now the functions are type hinted as being instances of private classes (one class for each function) that inherits from GenericFunction and provides a __call__ method according to the function signature described in _FUNCTIONS. This allows type-safe access to the attributes like name as well as being able to call the functions and infer the return types:

from geoalchemy2 import functions
reveal_type(functions.ST_Polygon)
reveal_type(functions.ST_Polygon())
reveal_type(functions.ST_Polygon(123))
reveal_type(functions.ST_Polygon.name)

gives

testmodule.py:2: note: Revealed type is "geoalchemy2.functions._ST_Polygon"
testmodule.py:3: note: Revealed type is "geoalchemy2.types.Geometry"
testmodule.py:4: note: Revealed type is "geoalchemy2.types.Geometry"
testmodule.py:5: note: Revealed type is "builtins.str"

In future, it would be simple to add function overloads like so:

class _ST_Union(functions.GenericFunction):
    ...

    @overload
    def __call__(self, geom1: geoalchemy2.types.Geometry, geom2: geoalchemy2.types.Geometry) -> geoalchemy2.types.Geometry: ...

    @overload
    def __call__(self, geoms: Iterable[geoalchemy2.types.Geometry]) -> geoalchemy2.types.Geometry: ...

    def __call__(self, *args: Any, **kwargs: Any) -> geoalchemy2.types.Geometry: ...

ST_Union: _ST_Union
The Journey I took to get the final result

Initially I generated regular function signatures for the dynamic ST_* functions, but then realised that there are additional attributes attached to these functions such as name.

I then took a look at typeshed because the stubs for zip are a class with many overloaded __new__ methods so I gave that a go. However in the case of zip the return values from __new__ are instances of the zip class so that technique is not applicable for the case of the ST_* functions.

I then looked at this issue which describes a workaround for providing attributes to a callable. The issue was using Protocol but luckily it also seems to work with other base classes such as sqlalchemy.sql.functions.GenericFunction. I tried this approach first that does not require ParamSpec (which was only added in python 3.10) and it does work, but if in future anyone wants to add specific overloads, for example
ST_3DUnion

@overload
def ST_Union(geom1: Geometry, geom2: Geometry) -> geoalchemy2.types.Geometry:
    ...

@overload
def ST_Union(geoms: Iterable[Geometry]) -> geoalchemy2.types.Geometry:
    ...

@_generic_function
def ST_Union(*args: Any, **kwargs: Any) -> geoalchemy2.types.Geometry:
    ...

then PyCharm and mypy are able to 'see through' the _generic_function decorator (when using the ParamSpec approach) and provide feedback on both the function arguments and the attributes provided by sqlalchemy.sql.functions.GenericFunction, although PyCharm support is patchy. MyPy seems to understand everything though at least.


from geoalchemy2 import functions
reveal_type(functions.ST_Polygon)
reveal_type(functions.ST_Polygon())
reveal_type(functions.ST_Polygon.name)

gives:

testmodule.py: note: Revealed type is "geoalchemy2._functions_helpers._GenericFunction[[*args: Any, **kwargs: Any], geoalchemy2.types.Geometry]"
testmodule.py: note: Revealed type is "geoalchemy2.types.Geometry"
testmodule.py: note: Revealed type is "builtins.str"

The issue with doing the seemingly simpler approach of

class ST_Polygon(sqlalchemy.sql.functions.GenericFunction):
    def __call__(self, *args: Any, **kwargs: Any) -> geoalchemy2.types.Geometry:
        ...

is that for the above, mypy prints:

testmodule.py: note: Revealed type is "def (*args: Any, **kwargs: Any) -> geoalchemy2.functions.ST_Polygon"
testmodule.py: note: Revealed type is "geoalchemy2.functions.ST_Polygon"
testmodule.py: note: Revealed type is "builtins.str"

because it thinks that ST_Polygon is a class and __call__ applies to instances of the class instead of the class itself (which is why I tried using __new__ initially). Decorating __call__ with @staticmethod or @classmethod does not seem to work.

edit: I realised that the difference with @_generic_function is that it was casting the function to be an instance of the protocol class instead of the class itself, and that can be done without a decorator (which doesn't matter in this case as the code is auto-generated anyway so extra boilerplate is not an issue).
With this change the code is simpler although slightly more verbose and should work well with overloading if there is a desire to add it


Checklist

This pull request is:

  • A documentation / typographical error fix
    • Good to go, no issue or tests are needed
  • A short code fix
    • Please include: Fixes: #<issue number> in the description if it solves an existing issue
      (which must include a complete example of the issue).
    • Please include tests that fail with the main branch and pass with the provided fix.
  • A new feature implementation or update an existing feature
    • Please include: Fixes: #<issue number> in the description if it solves an existing issue
      (which must include a complete example of the feature).
    • Please include tests that cover every lines of the new/updated feature.
    • Please update the documentation to describe the new/updated feature.

@mbway
Copy link
Contributor Author

mbway commented Oct 10, 2023

I saw that GeometryType returns a string and ST_Dimension returns an integer but the return types in _FUNCTIONS are None. is this an oversight or is there a reason it shouldn't be str or int?
These should either be updated to the correct types or the stubs should return Any instead of None if type=None in the _FUNCTIONS array since mypy would complain about x = GeometryType(...) for example because the type says GeometryType does not return anything.

@mbway mbway force-pushed the type_stubs branch 3 times, most recently from bdaf3b6 to a6d70bc Compare October 10, 2023 21:37
@adrien-berchet
Copy link
Member

Great work @mbway , thank you very much !
I'm afraid it will not work with Py37 but let's see what the CI says.

I saw that GeometryType returns a string and ST_Dimension returns an integer but the return types in _FUNCTIONS are None. is this an oversight or is there a reason it shouldn't be str or int? These should either be updated to the correct types or the stubs should return Any instead of None if type=None in the _FUNCTIONS array since mypy would complain about x = GeometryType(...) for example because the type says GeometryType does not return anything.

The return types in _FUNCTIONS should be None when we want to let SQLAlchemy infer the type (e.g. str or int), but maybe we can also force the return types for builtin types, I didn't try. In that case we could update all the return types (but be aware that the return type may vary depending on the parameters given to the function, e.g. ST_Transform can return a geometry or a raster depending on what kind of object it is applied to). If we cannot I guess we should just use Any in the stubs.

@mbway
Copy link
Contributor Author

mbway commented Oct 11, 2023

I'm afraid it will not work with Py37

It should do, I can't see anything I've used that wouldn't be in 3.7. Except that I did miss one instance where I used list[...] in an annotation instead of List[...]. That might have been what caused the tests to fail

the return type may vary depending on the parameters given to the function

if that information were accessible in a structured way that could be encoded using @overload like my example in the initial post, or the return type could just be Union[A, B, C, ...]

but maybe we can also force the return types for builtin types, I didn't try

I used str as the return type for ST_AsGeoJSON since I was adding that special case to the functions list manually anyway (so it doesn't affect the runtime construction of the function, just the type hint). I don't know how to check if adding a builtins return type to the _FUNCTIONS array works so I'll leave it for now and use None => Any for the type hints.

@adrien-berchet
Copy link
Member

I'm afraid it will not work with Py37

It should do, I can't see anything I've used that wouldn't be in 3.7. Except that I did miss one instance where I used list[...] in an annotation instead of List[...]. That might have been what caused the tests to fail

Ok, great!

the return type may vary depending on the parameters given to the function

if that information were accessible in a structured way that could be encoded using @overload like my example in the initial post, or the return type could just be Union[A, B, C, ...]

but maybe we can also force the return types for builtin types, I didn't try

I used str as the return type for ST_AsGeoJSON since I was adding that special case to the functions list manually anyway (so it doesn't affect the runtime construction of the function, just the type hint). I don't know how to check if adding a builtin return type to the _FUNCTIONS array works so I'll leave it for now and use None => Any for the type hints.

I tried to change the return type of ST_SRID from None to sqlalchemy.Integer and it seems to work. So I guess we can change all these None to sqlalchemy types and then map them to builtin types. It's even possible to instantiate a type and get the corresponding python type, e.g.:

from sqlalchemy import Integer
Integer().python_type

unfortunately we have to instantiate the type to get the python_type because it's a property. So I'm not sure we can use it in our case.
But if you don't have time for this we can keep the None -> Any for now and improve it later.

@mbway
Copy link
Contributor Author

mbway commented Oct 12, 2023

Any idea why the tests are failing?

@adrien-berchet
Copy link
Member

Because of

43: error: Incompatible types in assignment (expression has type "type[SummaryStatsCustomType]", base class "Function" defined the type as "TypeEngine[Any]")  [assignment]

but I don't know how to fix this :)

Maybe because mypy is not able to see that SummaryStatsCustomType derives from sqlalchemy.types.UserDefinedType which derives from sqlalchemy.types.TypeEngine?

@adrien-berchet
Copy link
Member

Or maybe because of this? (in geoalchemy2/types/__init__.py)

try:
    # SQLAlchemy >= 2
    from sqlalchemy.sql._typing import _TypeEngineArgument
except ImportError:
    # SQLAlchemy < 2
    _TypeEngineArgument = Any  # type: ignore

@mbway
Copy link
Contributor Author

mbway commented Oct 12, 2023

So are the failures mypy errors unrelated to my changes? I can take a look. I thought it type checked the whole package OK for me when I ran it locally.

@adrien-berchet
Copy link
Member

When I run it locally I get the same error as in the CI and I don't get any error using the master branch, so it seems related to this PR. It's probably something about custom types but I don't have time to look deeper into this for now.

@mbway mbway mentioned this pull request Oct 15, 2023
8 tasks
@mbway
Copy link
Contributor Author

mbway commented Oct 15, 2023

the test failures seem to be unrelated typing issues, possibly uncovered by introducing py.typed

@mbway
Copy link
Contributor Author

mbway commented Oct 16, 2023

I noticed that in my test container pypy took a long time to complete the tests and it seems that is also the case on the CI. It's strange that the reason for using pypy is faster speeds but it runs these tests 6 times slower

@adrien-berchet
Copy link
Member

the test failures seem to be unrelated typing issues, possibly uncovered by introducing py.typed

I still suspect we missed something in the custom type annotations. But as far as I'm concerned we can merge, it's already a very nice improvement. Just tell me if you to investigate further or if you are ok to merge.

I noticed that in my test container pypy took a long time to complete the tests and it seems that is also the case on the CI. It's strange that the reason for using pypy is faster speeds but it runs these tests 6 times slower

Yeah, I noticed that too. And sometimes the Pypy jobs in CI are failing because of a segmentation fault, which I was never able to reproduce. That's why I tried to use Conda to setup Pypy but it didn't solve this issue, so I don't know what to do...

@mbway
Copy link
Contributor Author

mbway commented Oct 17, 2023

I still suspect we missed something in the custom type annotations

Like what? The tests are passing now

Copy link
Member

@adrien-berchet adrien-berchet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good to me!
I just had 2 ideas related to this work but it can go in another PR if you prefer, WDYT?

geoalchemy2/_functions_helpers.py Outdated Show resolved Hide resolved
geoalchemy2/_functions_helpers.py Outdated Show resolved Hide resolved
Comment on lines +23 to +30
def _replace_indent(text: str, indent: str) -> str:
lines = []
for i, line in enumerate(text.splitlines()):
if i == 0 or not line.strip():
lines.append(line)
else:
lines.append(f"{indent}{line}")
return "\n".join(lines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also formatting the docstrings using textwrap? So the docstrings of these functions are formatted like the others (i.e. line length = 100).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change to wrap the text. The wrapping is done before the indenting so it's wrapped to 104 columns but this can be fixed if it matters.

@mbway
Copy link
Contributor Author

mbway commented Oct 18, 2023

sure. I'll make these changes now

@@ -4,16 +4,22 @@
from typing import Union


def _wrap_docstring(docstring: str) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles wrapping plus the explicit newlines in the docstrings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🤩

@adrien-berchet adrien-berchet merged commit eeee458 into geoalchemy:master Oct 18, 2023
8 checks passed
@adrien-berchet
Copy link
Member

adrien-berchet commented Oct 18, 2023

Thank you very much @mbway for this brilliant PR!!!

@adrien-berchet
Copy link
Member

Oh I had a last question, do you think we could solve #396 using @overload?

@mbway
Copy link
Contributor Author

mbway commented Oct 19, 2023

I'm not 100% sure what #396 is referring to. If it is only a typing issue where you want the type checker to understand that

ST_Transform(..., type_=Raster)

returns a raster and without type_=Raster it returns a geometry then that can be done with @overload

And if you want

ST_Intersection(geom, geom)

to be allowed (type checks) but not

ST_Intersection(geom, geom, geom)

(for example), then that can also be done with @overload.

but in both cases this is only information for the type checker and not at runtime. It may be possible to inspect overloaded signatures at runtime but it would be unusual to do so.
So if the issue is just about documenting to the user and the type checker what should be allowed then @overload should work for you, but if something different needs to be done at runtime for the different overloads then that needs some other mechanism (although in that case @overload will still be useful to document what is allowed to the type checker in addition to what is done at runtime)

@adrien-berchet
Copy link
Member

I just realize something: shouldn't we also add py.typed in the MANIFEST.in to make it work?

@adrien-berchet
Copy link
Member

I'm not 100% sure what #396 is referring to. If it is only a typing issue where you want the type checker to understand that

ST_Transform(..., type_=Raster)

returns a raster and without type_=Raster it returns a geometry then that can be done with @overload

And if you want

ST_Intersection(geom, geom)

to be allowed (type checks) but not

ST_Intersection(geom, geom, geom)

(for example), then that can also be done with @overload.

but in both cases this is only information for the type checker and not at runtime. It may be possible to inspect overloaded signatures at runtime but it would be unusual to do so. So if the issue is just about documenting to the user and the type checker what should be allowed then @overload should work for you, but if something different needs to be done at runtime for the different overloads then that needs some other mechanism (although in that case @overload will still be useful to document what is allowed to the type checker in addition to what is done at runtime)

The point of #396 is also to make SQLAlchemy return the proper type, so if we write ST_SetSRIDgeom, int) it will return a Geometry while if we write ST_SetSRID(rast, int) it will return a Raster (while currently we have to write )ST_SetSRID(rast, int, type_=Raster) to get the proper result in the last case. So as far as I understand this is not possible with @overload, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants