Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add signature for dataclasses.replace #14849

Merged
merged 49 commits into from
Jun 17, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
6dbaf9c
Add signature for dataclasses.replace
ikonst Mar 7, 2023
4dcbe44
add ClassVar
ikonst Mar 7, 2023
7ed3741
prevent misleading note
ikonst Mar 7, 2023
89257b5
stash in a secret class-private
ikonst Mar 8, 2023
32b1d47
fix typing
ikonst Mar 8, 2023
1f08816
docs and naming
ikonst Mar 8, 2023
c456a5f
inst -> obj
ikonst Mar 8, 2023
8118d29
add the secret symbol to deps.test
ikonst Mar 8, 2023
789cb2b
language
ikonst Mar 8, 2023
9f0974c
nit
ikonst Mar 8, 2023
a37e406
nit
ikonst Mar 8, 2023
367c0e9
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst Mar 9, 2023
0e84c4f
make obj positional-only
ikonst Mar 14, 2023
9cfc081
Merge branch '2023-03-06-dataclasses-replace' of https://github.com/i…
ikonst Mar 14, 2023
7b907cf
add pythoneval test
ikonst Mar 14, 2023
985db60
use py3.7 syntax
ikonst Mar 14, 2023
15dbb7b
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst Mar 17, 2023
3227fde
Fix lint
ikonst Mar 17, 2023
2dbf249
use syntactically invalid name for symbol
ikonst Mar 30, 2023
b32881c
Merge branch '2023-03-06-dataclasses-replace' of https://github.com/i…
ikonst Mar 30, 2023
26056a4
Merge remote-tracking branch 'origin/master' into 2023-03-06-dataclas…
ikonst Mar 30, 2023
40315b7
mypy-replace must use name mangling prefix
ikonst Mar 30, 2023
c005895
Generic dataclass support
ikonst Mar 30, 2023
d71bc21
add fine-grained test
ikonst Mar 30, 2023
d914b94
disable for arbitrary transforms
ikonst Mar 31, 2023
2cb6dee
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst Apr 6, 2023
5735726
fix testDataclassTransformReplace
ikonst Apr 7, 2023
d38897e
better error message for generics
ikonst Apr 7, 2023
04f0ee3
add support for typevars
ikonst Apr 8, 2023
f402b86
streamline
ikonst Apr 8, 2023
306c3f3
self-check
ikonst Apr 8, 2023
9b491f5
Merge remote-tracking branch 'origin/master' into 2023-03-06-dataclas…
ikonst Apr 21, 2023
283fe3d
Add improved union support from #15050
ikonst Apr 21, 2023
29780e9
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst May 1, 2023
9c43ab6
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst May 3, 2023
94024ba
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst May 9, 2023
70240c4
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst May 16, 2023
99bf973
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst May 17, 2023
8fe75a7
add to testReplaceUnion
ikonst May 22, 2023
bea50e8
testReplaceUnion: fix B.y to be int
ikonst May 22, 2023
cd35951
rename testcase replace to testReplace
ikonst May 22, 2023
d71a7c0
Merge remote-tracking branch 'origin/master' into 2023-03-06-dataclas…
ikonst May 22, 2023
957744e
Merge branch 'master' into 2023-03-06-dataclasses-replace
ikonst Jun 2, 2023
6abf6ff
Merge remote-tracking branch 'origin/master' into pr/ikonst/14849
ikonst Jun 12, 2023
4c4fc94
remove get_expression_type hack
ikonst Jun 12, 2023
be4a290
assert isinstance(replace_sig, ProperType)
ikonst Jun 17, 2023
ee0ae21
add a TODO for _meet_replace_sigs
ikonst Jun 17, 2023
3f656f8
Merge remote-tracking branch 'origin/master' into 2023-03-06-dataclas…
ikonst Jun 17, 2023
65d6e89
Merge remote-tracking branch 'origin/master' into 2023-03-06-dataclas…
ikonst Jun 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 74 additions & 1 deletion mypy/plugins/dataclasses.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from mypy import errorcodes, message_registry
from mypy.expandtype import expand_type
from mypy.messages import format_type_bare
from mypy.nodes import (
ARG_NAMED,
ARG_NAMED_OPT,
Expand All @@ -23,6 +24,7 @@
Context,
DataclassTransformSpec,
Expression,
FuncDef,
IfStmt,
JsonDict,
NameExpr,
Expand All @@ -37,7 +39,7 @@
TypeVarExpr,
Var,
)
from mypy.plugin import ClassDefContext, SemanticAnalyzerPluginInterface
from mypy.plugin import ClassDefContext, FunctionSigContext, SemanticAnalyzerPluginInterface
from mypy.plugins.common import (
_get_decorator_bool_argument,
add_attribute_to_class,
Expand Down Expand Up @@ -74,6 +76,7 @@
frozen_default=False,
field_specifiers=("dataclasses.Field", "dataclasses.field"),
)
_INTERNAL_REPLACE_SYM_NAME = "__mypy_replace"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to use a syntactically invalid name for internal things, like mypy-replace, we do this in several other places, see e.g. get_unique_redefinition_name().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Good idea using an invalid name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding an ephemeral_data dict to TypeInfo, like metadata but ephemeral and intended for passing data between phases rather than between invocations, thus it doesn't have to be serializable?

I was thinking something like ephemeral_data: Dict[Plugin, object] so that the data could remain compartmentalized by plugins and plugins wouldn't need to make up their sentinels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be not worth it. But if this will keep coming, yes, we can add something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar need in my pynamodb plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See deserialize_type call:
https://github.com/pynamodb/pynamodb-mypy/blob/a9684e20b413c39b447e54238003c2d99a5c0c73/pynamodb_mypy/plugin.py#L50-L60

... even though... I can't quite remember, maybe that's a good thing? i.e.maybe due to caching semanal is skipped and metadata is what I'm supposed to consult?

But, as you can see, in the checker phase I don't have a public API to really "deserialize" what's in the metadata back into a type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi Can you advice whether get_class_decorator_hook_2 will be called when there's already a cache?



class DataclassAttribute:
Expand Down Expand Up @@ -326,6 +329,7 @@ def transform(self) -> bool:
add_attribute_to_class(self._api, self._cls, "__match_args__", match_args_type)

self._add_dataclass_fields_magic_attribute()
self._add_internal_replace_method(attributes)

info.metadata["dataclass"] = {
"attributes": [attr.serialize() for attr in attributes],
Expand All @@ -334,6 +338,35 @@ def transform(self) -> bool:

return True

def _add_internal_replace_method(self, attributes: list[DataclassAttribute]) -> None:
"""
Stashes the signature of 'dataclasses.replace(...)' for this specific dataclass
to be used later whenever 'dataclasses.replace' is called for this dataclass.
"""
arg_types: list[Type] = [Instance(self._cls.info, [])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this (and also the return type below, etc) look not very careful about generic dataclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'd be better off not adding the first arg and ret_type in this phase, and instead add them in the hook:
c005895

arg_kinds = [ARG_POS]
arg_names: list[str | None] = [None]
for attr in attributes:
assert attr.type is not None
arg_types.append(attr.type)
arg_kinds.append(
ARG_NAMED if attr.is_init_var and not attr.has_default else ARG_NAMED_OPT
)
arg_names.append(attr.name)

signature = CallableType(
arg_types=arg_types,
arg_kinds=arg_kinds,
arg_names=arg_names,
ret_type=Instance(self._cls.info, []),
fallback=self._api.named_type("builtins.function"),
name=f"replace of {self._cls.info.name}",
)

self._cls.info.names[_INTERNAL_REPLACE_SYM_NAME] = SymbolTableNode(
kind=MDEF, node=FuncDef(typ=signature), plugin_generated=True
)

def add_slots(
self, info: TypeInfo, attributes: list[DataclassAttribute], *, correct_version: bool
) -> None:
Expand Down Expand Up @@ -787,3 +820,43 @@ def _is_dataclasses_decorator(node: Node) -> bool:
if isinstance(node, RefExpr):
return node.fullname in dataclass_makers
return False


def replace_function_sig_callback(ctx: FunctionSigContext) -> CallableType:
"""
Returns a signature for the 'dataclasses.replace' function that's dependent on the type
of the first positional argument.
"""
if len(ctx.args) != 2:
# Ideally the name and context should be callee's, but we don't have it in FunctionSigContext.
ctx.api.fail(f'"{ctx.default_signature.name}" has unexpected type annotation', ctx.context)
return ctx.default_signature

if len(ctx.args[0]) != 1:
return ctx.default_signature # leave it to the type checker to complain

obj_arg = ctx.args[0][0]

# <hack>
from mypy.checker import TypeChecker

assert isinstance(ctx.api, TypeChecker)
obj_type = ctx.api.expr_checker.accept(obj_arg)
# </hack>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this kind of hack is quite common (I myself used it several times, for old sqlalchemy plugin, and internally). I would propose to instead add another plugin hook, e.g. get_function_signature_hook_2 (similar to what we have for classes), that would be called after infer_arg_types_in_context() with a context that includes inferred argument types.

cc @JukkaL what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal still stands. Do we have an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's #14845 and #10216, though not one spelling out the "get_function_signature_hook_2" solution.


obj_type = get_proper_type(obj_type)
if not isinstance(obj_type, Instance):
return ctx.default_signature

replace_func = obj_type.type.get_method(_INTERNAL_REPLACE_SYM_NAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this hack at all :(

I wish I could've deserialized metadata, or even accessed a list of DataclassAttributes that's already been deserialized...

So, what's the problem? Looks like we have it in the metadata anyway:

info.metadata["dataclass"] = {
            "attributes": [attr.serialize() for attr in attributes],

Copy link
Contributor Author

@ikonst ikonst Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata should contain serialized JSON, not ad hoc type objects that I don't mean to serialize. That's why I'm saying I'd like if there'd be a way to stash runtime arbitrary data belonging to a plugin on a TypeInfo.

I can't easily serialize it again from the metadata at the type checker stage, and anyway the analysis stage works much better to construct the signature, but then I have to put it somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can deserialize attributes with

def deserialize(
in replace_function_sig_callback and do not store anything extra.

Or am I missing something?

Copy link
Contributor Author

@ikonst ikonst Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look it up again, but I believe I couldn't do it with the API provided by that hook, but only with the semantic analysis API.

Also

  • it makes sense to build the signature once and then cache it, and
  • (more importantly IMO) the other signature, for __init__, is built ahead of time, so it'd be more consistent to build the replace signature at the same time and place. One way to look at it is that in a different API design choice, replace could've been an instance method and then we'd add it together with __init__. It being a static method is "coincidental".

BTW, I considered implementing it differently, by replacing replace with a synthetic overloaded func, then have the plugin append overloads, but I think it'd result in worse performance and worse error messages (imagine having an error listing of all replace "overloads" for all dataclasses in your codebase).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to build the signature once and then cache it,

Yes, sounds reasonable. But, other plugins do not do this. I don't think that this use-case should be the first one to do something like this.

I considered implementing it differently, by replacing replace with a synthetic overloaded func

This can quickly get out of control, if you have a lot of arguments including optional ones.

Plus, I agree with your point that overloads can generate unreadable error messages.

I can look it up again, but I believe I couldn't do it with the API provided by that hook, but only with the semantic analysis API.

I cannot see anything specific about it. So, the algorithm would be:

def replace_sig_hook(ctx):
    attributes = deserialize_metadata()
    signature = compute_from_dataclass_attributes(attributes)
    return signature

Please, in case you have any problems - let me know, I would be happy to help!

In case this would be very slow (and users would complain) we can find ways to optimize it in the future 👍

Copy link
Contributor Author

@ikonst ikonst Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @sobolevn

Please, in case you have any problems - let me know, I would be happy to help!

I think this code from pynamodb-mypy illustrates the problem:
https://github.com/pynamodb/pynamodb-mypy/blob/a9684e20b413c39b447e54238003c2d99a5c0c73/pynamodb_mypy/plugin.py#L50-L60

As you see, I need to use "private API" to deserialize the type at the type-checker phase. I remember wishing there was a "plugin_ephemeral_storage" when I was working on that plugin, since I'm literally deserializing something that I just serialized in a previous phase.

In case this would be very slow (and users would complain) we can find ways to optimize it in the future

My main concern is indeed not the slowness (cannot imagine replace being a bottleneck), but design.

  1. The metadata exists to pass data between multiple runs of mypy (via the cache). Using it to pass information between phases feels hacky, where we could just as easy have another field to keep such data in native form (ephemerally).
  2. I think the code flows better when all signatures (whether instance methods ,or a global function that's really specialized per-first-arg — in many ways, the same thing) are synthesized in one place, and semanal phase is probably the right place.

if replace_func is None:
obj_type_str = format_type_bare(obj_type)
ctx.api.fail(
f'Argument 1 to "replace" has incompatible type "{obj_type_str}"; expected a dataclass',
ctx.context,
)
return ctx.default_signature

signature = get_proper_type(replace_func.type)
assert isinstance(signature, CallableType)
return signature
4 changes: 3 additions & 1 deletion mypy/plugins/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,12 @@ def get_function_hook(self, fullname: str) -> Callable[[FunctionContext], Type]
def get_function_signature_hook(
self, fullname: str
) -> Callable[[FunctionSigContext], FunctionLike] | None:
from mypy.plugins import attrs
from mypy.plugins import attrs, dataclasses

if fullname in ("attr.evolve", "attrs.evolve", "attr.assoc", "attrs.assoc"):
return attrs.evolve_function_sig_callback
elif fullname == "dataclasses.replace":
return dataclasses.replace_function_sig_callback
return None

def get_method_signature_hook(
Expand Down
35 changes: 34 additions & 1 deletion test-data/unit/check-dataclasses.test
Original file line number Diff line number Diff line change
Expand Up @@ -2002,7 +2002,6 @@ e: Element[Bar]
reveal_type(e.elements) # N: Revealed type is "typing.Sequence[__main__.Element[__main__.Bar]]"
[builtins fixtures/dataclasses.pyi]


[case testIfConditionsInDefinition]
# flags: --python-version 3.11 --always-true TRUTH
from dataclasses import dataclass
Expand Down Expand Up @@ -2036,4 +2035,38 @@ Foo(
present_4=4,
present_5=5,
)

[builtins fixtures/dataclasses.pyi]

[case testReplace]
from dataclasses import dataclass, replace, InitVar
from typing import ClassVar

@dataclass
class A:
x: int
q: InitVar[int]
q2: InitVar[int] = 0
c: ClassVar[int]


a = A(x=42, q=7)
a2 = replace(a) # E: Missing named argument "q" for "replace" of "A"
a2 = replace(a, q=42)
a2 = replace(a, x=42, q=42)
a2 = replace(a, x=42, q=42, c=7) # E: Unexpected keyword argument "c" for "replace" of "A"
a2 = replace(a, x='42', q=42) # E: Argument "x" to "replace" of "A" has incompatible type "str"; expected "int"
a2 = replace(a, q='42') # E: Argument "q" to "replace" of "A" has incompatible type "str"; expected "int"
reveal_type(a2) # N: Revealed type is "__main__.A"

[builtins fixtures/dataclasses.pyi]

[case testReplaceNotDataclass]
from dataclasses import replace

replace(5) # E: Argument 1 to "replace" has incompatible type "int"; expected a dataclass

class C:
pass

replace(C()) # E: Argument 1 to "replace" has incompatible type "C"; expected a dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to add tests that cover:

  • Generic dataclasses
  • Fine-grained mode (to check a call elsewhere will be re-checked on field type change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify on the fine grained (daemon) mode, I want to double-check that if there is only a replace call in a different module (and nothing else), it will be correctly rechecked (e.g. a new error will appear) if you change one of the dataclass field types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, did I do this right? d71bc21

2 changes: 2 additions & 0 deletions test-data/unit/deps.test
Original file line number Diff line number Diff line change
Expand Up @@ -1388,6 +1388,7 @@ class B(A):
<m.A.(abstract)> -> <m.B.__init__>, m
<m.A.__dataclass_fields__> -> <m.B.__dataclass_fields__>
<m.A.__init__> -> <m.B.__init__>, m.B.__init__
<m.A.__mypy_replace> -> <m.B.__mypy_replace>
<m.A.__new__> -> <m.B.__new__>
<m.A.x> -> <m.B.x>
<m.A.y> -> <m.B.y>
Expand Down Expand Up @@ -1419,6 +1420,7 @@ class B(A):
<m.A.__dataclass_fields__> -> <m.B.__dataclass_fields__>
<m.A.__init__> -> <m.B.__init__>, m.B.__init__
<m.A.__match_args__> -> <m.B.__match_args__>
<m.A.__mypy_replace> -> <m.B.__mypy_replace>
<m.A.__new__> -> <m.B.__new__>
<m.A.x> -> <m.B.x>
<m.A.y> -> <m.B.y>
Expand Down
2 changes: 2 additions & 0 deletions test-data/unit/lib-stub/dataclasses.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,5 @@ def field(*,


class Field(Generic[_T]): pass

def replace(obj: _T, /, **changes: Any) -> _T: ...
ikonst marked this conversation as resolved.
Show resolved Hide resolved
20 changes: 20 additions & 0 deletions test-data/unit/pythoneval.test
Original file line number Diff line number Diff line change
Expand Up @@ -1984,3 +1984,23 @@ def good9(foo1: Foo[Concatenate[int, P]], foo2: Foo[[int, str, bytes]], *args: P

[out]
_testStrictEqualitywithParamSpec.py:11: error: Non-overlapping equality check (left operand type: "Foo[[int]]", right operand type: "Bar[[int]]")

[case testDataclassReplace]
from dataclasses import dataclass, replace

@dataclass
class A:
x: int


a = A(x=42)
a2 = replace(a, x=42)
reveal_type(a2)
a2 = replace()
a2 = replace(a, x='spam')
a2 = replace(a, x=42, q=42)
[out]
_testDataclassReplace.py:10: note: Revealed type is "_testDataclassReplace.A"
_testDataclassReplace.py:11: error: Too few arguments for "replace"
_testDataclassReplace.py:12: error: Argument "x" to "replace" of "A" has incompatible type "str"; expected "int"
_testDataclassReplace.py:13: error: Unexpected keyword argument "q" for "replace" of "A"