Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuerySet.annotate improvements #398

Merged
merged 26 commits into from
Jul 23, 2021
Merged

Conversation

syastrov
Copy link
Contributor

@syastrov syastrov commented Jun 12, 2020

  • QuerySet.annotate now returns correct self-type rather than QuerySet[Any]. Attribute access falls back to Any.
  • QuerySets that have an annotated model do not report errors during .filter() when called with invalid fields.
  • QuerySets that have an annotated model return ordinary dict rather than TypedDict for .values()
  • QuerySets that have an annotated model return Any rather than typed Tuple for .values_list()

Calling QuerySet.annotate introduces an "annotated" model type which is a subtype of the original model dynamically (named "MyModel (annotated)"), but which allows arbitrary getting/setting of attributes.

This is originally to fix this issue:

a = MyModel.objects.filter()
a = a.annotate() 

error: Incompatible types in assignment (expression has type “QuerySet[Any]“, variable has type “Manager[Any]“)

The problem is because of not using self-type

@syastrov syastrov force-pushed the queryset-annotate branch 3 times, most recently from e7f0544 to 41de338 Compare June 12, 2020 21:55
sobolevn
sobolevn previously approved these changes Jun 12, 2020
Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

The only question I have is: how one can represent annotated User statically? For example, we have this code:

def allowed(param: User):
     return param.foo  # E: "User" has no attribute "foo" 

But, when one will pass there an instance with .annotate(foo=...) it will work in runtime, but fail in typechecking. Do you have any solution in mind?

@syastrov
Copy link
Contributor Author

Thanks.

At the moment, I don't have a solution for representing annotated User as a type annotation in the actual code. I wanted to address the issue with calling .annotate turning the Manager[SomeModel] into QuerySet[Any], which is annoying and loses information.
So currently, this PR fixes that and also allows access to any attr on the model (within the same function). As you noted, there is no way of statically annotating an annotated model as the type is generated dynamically.

One idea is to make all Models generic to mark them as annotated, like:

_Annotations = TypeVar('_Annotations')
class Model(Generic[_Annotations]):
  pass

class MyModel(Model): ...

class MyAnnotations(TypedDict):
  foo: str
  bar: str

def func(param: MyModel[MyAnnotations]): pass

And handle this in the plugin.

But this might be too intrusive? We also have to see if we can default the generic param in a way so that by default the model is not annotated.

Another idea would be to make use of Annotated introduced in https://www.python.org/dev/peps/pep-0593/ (ironically named very similarly to what we might want to call it)
This way, the type would be the same as the original type at runtime, but we should be able to introduce behavior in the plugin to make it behave differently.
It could look like this:

from typing import Type
from typing_extensions import Annotated
from django.db.models.base import Model

class Annotations:
    def __init__(self, **kwargs: Type): ...

# User code:
# If no arguments, assume any
def func(user: Annotated[Model, Annotations()]) -> str:
  return user.foo + user.bar

# If we later want to support specifying which fields are annotated, it could look like this:
def func2(user: Annotated[Model, Annotations(foo=str, bar=str)]) -> str:
  return user.foo + user.bar

At first, I thought it's only available in Python 3.9 (unreleased), but it looks like it's in typing_extensions so we actually could try using it in older versions.
(Downside is that it seems like PyCharm doesn't support it yet though -- it thinks it's just Any, so might be annoying for those trying to autocomplete).

Another downside of using Annotated is that the type-checker won't check for type compatibility between different annotated models, e.g. assuming that one is annotated with a TypedDict that is a subset of another TypedDict.

@syastrov
Copy link
Contributor Author

Looks like it will be some work to fix django typechecking

@sobolevn
Copy link
Member

@syastrov I really like your Annotated[] idea. Let's merge this PR as is and open a new issue to track Annotated[] there!

Thanks a lot for the great work! 👍

@syastrov
Copy link
Contributor Author

The tests are passing now, but I have some reservations about this, and would like to do some additional testing. So please, hold off on merging.

@syastrov
Copy link
Contributor Author

By the way, looks like PyCharm is working on supporting Annotated: https://youtrack.jetbrains.com/issue/PY-41847
And that version is planned for release in July, so maybe there's a chance it's coming soon.

On second thought, I don't have any reservations as long as this PR is just fixing the problem within functions and not ACROSS functions (due to not being able to add type annotations using an "annotated" model or QuerySet).

I still need some more tests of .values() / .values_list() with no params and I think it should be good.

@sobolevn
Copy link
Member

I still need some more tests of .values() / .values_list() with no params and I think it should be good.

Ok, let's wait for it and then merge!

@syastrov I have opened a new issue to track Annotated feature: #399

@syastrov
Copy link
Contributor Author

syastrov commented Aug 1, 2020

@sobolevn I finally found some time to finish this :)

Copy link
Member

@kszmigiel kszmigiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks totally good for me
@syastrov thank you for you work!
@sobolevn How should we deal with all the changes in mypy plugin created since the refactor branch was created? Should I apply all these changes to the refactor branch by hand? What's your opinion?

@sobolevn
Copy link
Member

sobolevn commented Aug 5, 2020

Let's merge our own refactored code first. Then we can rebase these changed and apply them as well.
In case nothing is broken.

syastrov added 10 commits May 8, 2021 09:08
- QuerySets that have an annotated model do not report errors during .filter() when called with invalid fields.
- QuerySets that have an annotated model return ordinary dict rather than TypedDict for .values()
- QuerySets that have an annotated model return Any rather than typed Tuple for .values_list()
…ming QuerySet.first() won't return None)

Fix mypy self-check.
@syastrov
Copy link
Contributor Author

syastrov commented Jul 6, 2021

Hmm, this is getting tricky. I believe the type ignores in _ValuesQuerySet are causing certain things to not work.

I had an issue with the test code:

qs1 = Blog.objects.values('text').annotate(foo=F('id'))
reveal_type(qs1) # N: Revealed type is "django.db.models.query._ValuesQuerySet[myapp.models.Blog (annotated), builtins.dict[builtins.str, Any]]"

The actual value is django.db.models.query._ValuesQuerySet[myapp.models.Blog, TypedDict({'text': builtins.str})]", which is not expected. I believe the problem is that annotate lacks the self-type _QS. I tried adding that, but mypy says Invalid self argument "_ValuesQuerySet[Blog, TypedDict({'text': str})]" to attribute function "annotate" with type "Callable[[_QS, VarArg(Any), KwArg(Any)], _QS]".
I believe this is due to the type ignore on the class _ValuesQuerySet(...) line, which causes mypy to seemingly not apply the inheritance and thus messes up the self-typing logic.

To try to fix the inheritance by removing that type ignore, I tried changing the class definition, but this causes a failure in values_list_flat_true_with_ids.

The definition I am trying:

class _ValuesQuerySet(QuerySet[_T], Collection[_Row], Reversible[_Row], Sized):
    def __iter__(self) -> Iterator[_Row]: ... # type: ignore 
    @overload
    def __getitem__(self, i: int) -> _Row: ...
    @overload
    def __getitem__(self: _QS, s: slice) -> _QS: ...
    def iterator(self, chunk_size: int = ...) -> Iterator[_Row]: ...  # type: ignore
    def get(self, *args: Any, **kwargs: Any) -> _Row: ...  # type: ignore
    def earliest(self, *fields: Any, field_name: Optional[Any] = ...) -> _Row: ...  # type: ignore
    def latest(self, *fields: Any, field_name: Optional[Any] = ...) -> _Row: ...  # type: ignore
    def first(self) -> Optional[_Row]: ...  # type: ignore
    def last(self) -> Optional[_Row]: ...  # type: ignore

I believe the problem is the type ignore on __iter__. It means that mypy doesn't really apply this function annotation, but uses the super-class's (QuerySet's) definition.

So there is not really any way to resolve this, I think, except re-instating _BaseQuerySet, I believe.

@syastrov
Copy link
Contributor Author

syastrov commented Jul 7, 2021

I managed to make it possible to annotate functions as taking "annotated models" or QuerySets using PEP 593 Annotated.

The "repr" also looks good due to some hacks :)

I quite like this way of doing it, as it's more self-explainable when you see the error messages, and it makes sense that users can also annotate their own functions using these types.

So, finally, if the issues above regarding inheritance of _ValuesQuerySet are fixed, then this PR should be ready (hopefully) :)

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Like really great work! 👍

username = models.CharField(max_length=100)


def func(m: WithAnnotations[MyModel]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be possible for us to later add something like WithAnnotations[Model, TypedDict] and not break things for users? Double checking 🙂

mypy_django_plugin/django/context.py Outdated Show resolved Hide resolved
mypy_django_plugin/lib/helpers.py Outdated Show resolved Hide resolved


def get_or_create_annotated_type(
api: Union[SemanticAnalyzer, CheckerPluginInterface], model_type: Instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api: SemanticAnalyzer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even TypeAnalyzer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunately used both in MethodContext (CheckerPluginInterface) and AnalyzeTypeContext (where I can grab SemanticAnalyzer from the api.api attribute).

@sobolevn
Copy link
Member

sobolevn commented Jul 7, 2021

Looks like we still have a broken test:

=================================== FAILURES ===================================
________________________ values_list_flat_true_with_ids ________________________
/home/runner/work/django-stubs/django-stubs/tests/typecheck/managers/querysets/test_values_list.yml:220: 
E   pytest_mypy_plugins.utils.TypecheckAssertionError: Invalid output: 
E   Expected:
E     main:2: note: Revealed type is "django.db.models.query._ValuesQuerySet[myapp.models.Blog, builtins.i...
E     main:3: note: Revealed type is "django.db.models.query._ValuesQuerySet[myapp.models.Blog, builtins.i...
E     main:5: note: Revealed type is "builtins.list[builtins.int*]" (diff)
E   Actual:
E     main:2: note: Revealed type is "django.db.models.query._ValuesQuerySet[myapp.models.Blog, builtins.i...
E     main:3: note: Revealed type is "django.db.models.query._ValuesQuerySet[myapp.models.Blog, builtins.i...
E     main:5: note: Revealed type is "builtins.list[myapp.models.Blog*]" (diff)
E   
E   Alignment of first line difference:
E     E: ...ed type is "builtins.list[builtins.int*]"
E     A: ...ed type is "builtins.list[myapp.models.Blog*]"
E                                     ^
=========================== short test summary info ============================
FAILED tests/typecheck/managers/querysets/test_values_list.yml::values_list_flat_true_with_ids

Is it the one you have mnetioned in #398 (comment)
Is it related to _BaseQuerySet?

This currently has the drawback that error messages display the internal type _QuerySet, with both type arguments.

See also discussion on typeddjango#661 and typeddjango#608.

Fixes typeddjango#635: QuerySet methods on Managers (like .all()) now return QuerySets rather than Managers.

Address code review by @sobolevn.
Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure gold 🏅 Thank you!

Merge when you fill confident!

@syastrov
Copy link
Contributor Author

Thanks! I tried to explore this route with annotating using TypedDicts. I think it's a good idea, and motivated me to change the implementation from using Annotated(...) to Annotated[...], since you cannot use "class construction" in function argument type annotations, at least at runtime.

I updated the README as well, so you can see the current limitations.

Let's see if I didn't break a lot of things :)

README.md Show resolved Hide resolved
Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@syastrov
Copy link
Contributor Author

I would like to test it against an internal code-base before merging.

However, there is still one problem which I don't think we can solve at the moment, at least with the current implementation approach, but which I think would be useful to solve.

I would like to define a function which can add annotations to an existing QuerySet.
It might look something like this (although WithAnnotations[_QS, FooDict] would probably be cleaner):

class FooDict(TypedDict):
  foo: int

_Model = TypeVar("_Model", bound=MyModel)
def add_annotation_with_typevar(qs: QuerySet[_Model]) -> QuerySet[WithAnnotations[_Model, FooDict]]:
    return qs.annotate(foo=F('id'))

I'd like to be able to have multiple functions like these and chain them together so I can build up a QuerySet with multiple annotations.

This won't work correctly in the mypy plugin since get_type_analyze_hook receives the _Model arg for WithAnnotations as a TypeVarType, which doesn't really allow us to generate this "intersection type" -- the concrete subclass of _Model and a protocol containing the fields in the TypedDict. As a hack, I think we'd have to return some intermediate class that then somehow allows attribute access to the given fields from the TypedDict (using get_attribute_hook). But I'm not sure how that would play in terms of subtyping (could we really get it to be considered a subtype of _Model then?). I think this is just limited by the fact that mypy doesn't have an intersection type.

Anyway, I don't think this problem should prevent this PR from being merged.

@syastrov
Copy link
Contributor Author

By the way, I found this issue python/mypy#6501 regarding getting a better repr for types.

…or example).

Fix some edge case with from_queryset after QuerySet changed to be an
alias to _QuerySet. Can't make a minimal test case as this only occurred
on a large internal codebase.
@syastrov
Copy link
Contributor Author

Alright, I got this to work against the internal codebase after a small fix.

I also noticed another bug, which I was able to workaround, but I was not able to make a minimal test case of.
Most likely, it was somehow possible to trigger it before these changes, but somehow it occurs for me now.

It involves having a custom manager which has a method with an argument annotated with User (from contrib.auth).
The error is KeyError: "models" in semantic analyzer, which is while it's trying to lookup myapp.modules.User (which is really just the imported contrib.auth.User). This doesn't happen if I change the annotation to be some other model than User, though.

Perhaps there is some defer call missing in the code that generates the manager class based on the custom manager class. But I am not sure.

If you agree, let's merge this!

@syastrov
Copy link
Contributor Author

Hmm, I did notice that when using from_queryset and when doing MyModel.objects.filter(...).my_method() mypy says that my_method doesn't exist on _QuerySet[MyModel, MyModel] . I think this is after my change where BaseManager no longer derives from QuerySet. There isn't any existing test for this it seems.

@sobolevn
Copy link
Member

sobolevn commented Jul 22, 2021

@syastrov let's open two separate issues for things you described and merge this! 👍

…ent with a type annotation on the QuerySet.

The mypy docstring on anal_type says not to call defer() after it.
@syastrov
Copy link
Contributor Author

I fixed the first issue (KeyError: "models") while investigating the other issue. I unfortunately can't make a reproducible test case. But, from the mypy docstring on anal_type, it says not to call defer after it and the plugin is doing that. So I've removed that, and removed allow_placeholders=True as well, since that seemed redundant.

@syastrov
Copy link
Contributor Author

@sobolevn You are welcome to merge if the tests pass :)

Then I'll open an issue on the second thing

@sobolevn sobolevn merged commit cfd69c0 into typeddjango:master Jul 23, 2021
@sobolevn
Copy link
Member

@syastrov thanks a lot for your amazing work! 🎉

@noelleleigh
Copy link

Wow, this has been a long time coming! When can we expect a release with these changes?

@syastrov
Copy link
Contributor Author

You are welcome! :) Glad it's merged. I opened an issue for the other thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants