Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out more of jsonschema stubs #7950

Merged
merged 2 commits into from
May 27, 2022

Conversation

sirosen
Copy link
Contributor

@sirosen sirosen commented May 25, 2022

Apply more detailed annotations to the format module and most of the exceptions module.


Some background:

jsonschema is starting to add internal annotations and in the last year added mypy to its own linting/testing. Hopefully at some point we can publish the package with annotations for general use, but we're not there today.

So there's going to be an ongoing project of adding internal annotations, and I'm going to try to port that stuff back into typeshed where appropriate.
There are also some updates here which aren't part of the jsonschema annotations, and I would like to at some point port them back in the other direction -- into jsonschema.

If anyone has ideas about how to get annotations gradually added and eventually published as part of a widely used package which is already in typeshed, I'm open to learning a better way of doing this.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Apply more detailed annotations to the format module and most of the
exceptions module.
@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@sirosen
Copy link
Contributor Author

sirosen commented May 27, 2022

I'm not sure what the mypy_primer error means.

I tried running mypy_primer myself (for the first time) with slightly different options from the github workflow, and here's what I got:

$ mypy_primer --custom-typeshed-repo ./typeshed --output concise
tornado (https://github.com/tornadoweb/tornado)
+ tornado/web.py:1687: error: Function does not return a value

aiohttp (https://github.com/aio-libs/aiohttp)
+ aiohttp/web_app.py:381: error: Function does not return a value  [func-returns-value]

Neither of which are, as far as I know, related to this change.

So possibly this schemathesis line is a false negative?

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm not too familiar with jsonschema, but here's a few thoughts from looking at the source code.

def is_email(instance) -> bool: ...
def is_ipv4(instance) -> bool: ...
def is_ipv6(instance) -> bool: ...
def is_email(instance: Any) -> bool: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the source code correctly, I think this (and several other of the is_foo functions here) can accept literally any type. In general, for functions like these, we prefer using object instead of Any for parameter annotations. Any is sort of an "escape hatch" to be used as a last resort, or for when the true type of a parameter isn't really expressible in the current type system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch this.

Aside/tangent: using object in similar functions (e.g. ensure_is_foo(x: object) -> Foo) can be less nice than Any because of how it responds to type narrowing. After you write enough "assert-is-X" utilities, it becomes a habit to use Any in basically any case in which one could use object. This is all off-topic, but I wonder if changing the way that object narrows would make it more comfortable/normal for typing users like myself to use it. When I first ran into this, I found it very surprising that isinstance(x, str) does not narrow x: object to str.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first ran into this, I found it very surprising that isinstance(x, str) does not narrow x: object to str.

That's not true:

https://mypy-play.net/?mypy=latest&python=3.10&gist=5104b5bedec0b6dd05bd8b6cf23fc1ef

Copy link
Contributor Author

@sirosen sirosen May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Hmm... I know I've run into unexpected issues in which object doesn't narrow but Any does. But clearly it wasn't such a simple case. If I see this again, I'll make sure to pay closer attention.

Sorry for the inaccuracy and distraction.

def is_ipv4(instance) -> bool: ...
def is_ipv6(instance) -> bool: ...
def is_email(instance: Any) -> bool: ...
def is_ipv4(instance: Any) -> bool: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function often returns an IPv4Address instance. From the look of the function, though, I guess it only really matters whether it's truthy or falsey (it doesn't actually have to be True or False). So I guess I'm okay pretending that this returns a boolean :)

Looks like the same goes for several other of the is_foo functions.

https://github.com/python-jsonschema/jsonschema/blob/65f591e5152e37867cdd5db1baf6ae175229713e/jsonschema/_format.py#L222

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I completely missed that! I'd like to fix the runtime logic to bool(...) so that these all have the same object -> bool signature. If you're okay with this "white lie" in the types, I'd like to keep this as the annotation and then go fix the library later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're okay with this "white lie" in the types, I'd like to keep this as the annotation and then go fix the library later.

I'm okay with that 😄 I guessed that that was probably the case!

@property
def absolute_path(self): ...
def absolute_path(self) -> Sequence[str]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to be as precise as possible with return types in typeshed, so that would imply using deque[str] here instead of Sequence[str]. If you think it's an implementation detail which precise type is returned here, however, then you may want to ignore me here :)

The same point applies to absolute_schema_path below.

Suggested change
def absolute_path(self) -> Sequence[str]: ...
def absolute_path(self) -> deque[str]: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's an implementation detail which precise type is returned here, however, then you may want to ignore me here :)

Yeah, I considered annotating it as deque -- that was in my first draft -- but then decided exactly this. Callers should not be relying on that value being a deque IMO.
In a way, your comment helps validate my decision. It means I was thinking the right kinds of things!

@property
def json_path(self): ...
def json_path(self) -> str: ...
def _contents(self) -> dict[str, Any]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bit involved, but you could consider returning a TypedDict here for more precise typing (each key will have a specific type associated with it for the corresponding value, if I understand the source code correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to punt on this for now, and just add a TODO comment to make the type more precise with TypedDict, if that's okay. Right now, I want to get this changeset "over the finish line" and then come back when I have some more time to work on jsonschema and on the typing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

STRONG_MATCHES: Any
from jsonschema import _utils, protocols

_RelevanceFuncType = Callable[[ValidationError], Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions passed to the key argument of builtins.max must return objects that satisfy the _typeshed.SupportsRichComparison interface:

def max(__arg1: _T, __arg2: _T, *_args: _T, key: Callable[[_T], SupportsRichComparison]) -> _T: ...

Suggested change
_RelevanceFuncType = Callable[[ValidationError], Any]
_RelevanceFuncType = Callable[[ValidationError], SupportsRichComparison]

(You'll have to import it from _typeshed at the top of the file.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's why Ctrl+F: "Comparable" didn't get me anything on the typing docs!

I wanted this but decided against adding

class _Comparable(Protocol):
    def __lt__(self, other: Any) -> bool: ...

to the stubs.

I'll make the change, with great enthusiasm.

(Is SupportsRichComparison on a path towards inclusion in typing-extensions and typing?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SupportsRichComparison on a path towards inclusion in typing-extensions and typing?

I'm rooting for it 🤞 but I think it's still too new to add it to typing_extensions just yet — we only added it a few months ago in #6583, so I don't think it can really be considered stable yet

@AlexWaygood
Copy link
Member

I'm not sure what the mypy_primer error means.

¯\_(ツ)_/¯

- is_X funcs are now `object -> bool`, not `Any -> bool`
- _Error.contents has a TODO comment to use TypedDict
- _RelevanceFuncType uses _typeshed.SupportsRichComparison
@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

schemathesis (https://github.com/schemathesis/schemathesis)
- src/schemathesis/specs/openapi/negative/__init__.py:6: note: ... from here:

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'll wait a bit to see if any other maintainers have any thoughts before merging

@JelleZijlstra
Copy link
Member

LGTM, but I'll wait a bit to see if any other maintainers have any thoughts before merging

I haven't done a full review but nothing stands out to me on skimming the stubs. If you're OK with the change, feel free to merge.

@AlexWaygood AlexWaygood merged commit f52da1e into python:master May 27, 2022
@AlexWaygood
Copy link
Member

LGTM, but I'll wait a bit to see if any other maintainers have any thoughts before merging

I haven't done a full review but nothing stands out to me on skimming the stubs. If you're OK with the change, feel free to merge.

Merged ;)

@sirosen sirosen deleted the update-jsonschema-stubs branch May 27, 2022 18:22
@hauntsaninja
Copy link
Collaborator

I tried running mypy_primer myself (for the first time) with slightly different options from the github workflow

The output you get from the command is the effect of diffing mypy master vs mypy 0.960 (while using a custom typeshed repo). I'm not sure what the schemathesis diff is about, but we do some output munging / mypy is sensitive to various things that cause it to change file processing order, and sometimes we get weird diffs.

@property
def absolute_path(self): ...
def absolute_path(self) -> Sequence[str]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is incorrect (at least with jsonschema 4.4.0) because path entries may be indexes into a JSON array. For example:

import jsonschema

SCHEMA = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "array",
    "items": {"type": "string"},
}


document = ["a", 12345, "c", "d"]

try:
    jsonschema.validate(document, SCHEMA)
except jsonschema.ValidationError as e:
    print(repr(e))
    for path_entry in e.absolute_path:
        print(repr(path_entry), type(path_entry))
$ pip show jsonschema | \grep Version
Version: 4.4.0

$ python temp.py
<ValidationError: "12345 is not of type 'string'">
1 <class 'int'>

I think Sequence[str | int] is the correct annotation? I'll make a PR with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Apologies for the oversight. These can be ints in precisely the case you mention. I'll double-check for other cases of this. (I'm not a typeshed maintainer, but I'll also go 👍 your PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to worry, glad it was straightforward to sort out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants