Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests specifically for tricky, white-lie signatures #1339

Closed
gnprice opened this issue May 24, 2017 · 20 comments
Closed

Tests specifically for tricky, white-lie signatures #1339

gnprice opened this issue May 24, 2017 · 20 comments
Labels
project: infrastructure typeshed build, test, documentation, or distribution related

Comments

@gnprice
Copy link
Contributor

gnprice commented May 24, 2017

We've had some useful, long discussions about writing tests for stubs (#754, #917). Most of the discussion has been around having tests for the bulk of the functionality in stubs and the libraries they describe. I think we're all broadly in agreement that

  • such comprehensive tests would be valuable; but
  • just writing out a test for each function and class stubbed isn't helpful, because it amounts to repeating what we're already saying in the stubs; and
  • a higher-tech approach to testing the bulk of the stubs' content won't be trivial to get to work with a good signal-to-noise ratio. (It'll be great if it can be done; some folks are working on it, and others are skeptical it can.)

This issue is for a narrower proposal: sometimes when a library's real signature is something complex or hard to express in the type system, our signature ends up pretty tricky or we end up telling sort of a white lie to the type-checkers. For these, even when looking right at the signature in one window and the library's docs (or an experiment in the REPL) in the other, it can be hard to tell whether they match. In these cases, I think a hand-written test would be useful -- probably looking a lot like a mypy unit test, with some code to be type-checked and an indication that certain lines should have errors and the rest shouldn't, and then we could e.g. run mypy and pytype against it.

@JukkaL suggested just this in one bullet point of a comment on #754, and gave some examples:

Add a test that exercises particularly complex signatures. Some recent examples where this could have been helpful are dict.get (a recent change to it broke it on mypy) and requests.api (there was a bad signature).

A new example today is #1331: auto doesn't really inherit from IntFlag, and we may find a new way of refining how we describe that interaction in the stubs. When we do it'll be useful to have a test for the functionality that that little lie in the stubs is intended to provide.

@gnprice
Copy link
Contributor Author

gnprice commented May 24, 2017

/cc @JukkaL @gvanrossum @vlasovskikh @matthiaskramm @JelleZijlstra @ambv from previous discussions

@JelleZijlstra
Copy link
Member

I like this idea. This would interact nicely with python/typing#436: the same test runner proposed there could run tests in typeshed.

@matthiaskramm
Copy link
Contributor

I'm ±0 on this.

Pro: Recently, we've had a number of pull requests with commit messages along the lines of
"Make [some example code] work with typeshed". Clearly, it would have been good to put [some example code] somewhere more meaningful.

Contra: It's a slippery slope. Testing leads to tests, which leads to more tests, which leads to complete test coverage, and look! We've just duplicated https://github.com/python/cpython/tree/master/Lib/test.

If we go down this road, we would need very clear wording that unit tests for pyi are reserved for special circumstances and not mandatory. That's a message that's really hard to convey, in a language culture as addicted to testing as Python.

@gvanrossum
Copy link
Member

I agree with what Matthias said (and it's why previously I was against such tests). But there are only a few committers on the project and together we can set the bar high for when tests are accepted in PRs. It will help if there are clear words in README.md and CONTRIBUTING.md about this (reviewers can then refer contributors to those words).

@vlasovskikh
Copy link
Member

I like the general idea of having tests for stubs. So having them at least for tricky signatures looks like a reasonable compromise to me.

@hauntsaninja
Copy link
Collaborator

An easy and hopefully unobjectionable way to do this is to create a repo with whatever code you want to test and add the repo to mypy_primer.

@JukkaL
Copy link
Contributor

JukkaL commented Feb 10, 2021

I think that now that we have the new directory structure, we can reopen this discussion.

One way to do this would be to suport an optional @test directory under each stub directory. The @test directory could contain any number of .py files. The simplest approach would to be type check these files and fail the test if there are any errors. We can also support specifying that certain lines must generate errors, for example by using a magic comment (for example, # ERROR).

Here's an example of how this could work. To add tests for foobar, we can add the file stubs/foobar/@test/test.py:

from foobar import complex_function

complex_function(1, arg="x")  # On this line there must not be an error

complex_function(1, arg=["x"])  # ERROR  # On this line there must be an error

We'd run the tests in Python 2 and/or Python 3 modes, depending on what's supported by the stubs. It should be possible to use Python version checks to have different tests for Python 2 and Python 3.

In addition to optionally having tests for tricky signatures, I believe that there are a few other valid use cases for tests:

  1. For each package, we can have a small number of tests that verify some of the most common use cases. These can act as sanity checks, and they can protect against accidentally making the stubs completely unusable (after major refactorings or after changes to semantics of stubs, for example).
  2. Any PR that fixes some signature can add a test to validate the fix. This can prevent regressions and make it easier to review the fix. It can also make the contributor more confident that their fix works as expected.
  3. Non-trivial re-exported definitions can be useful to test, since these are easy to break.

As discussed above, we don't want to have exhaustive tests that test every single signature in stubs. Also PRs that add dozens of annotations probably shouldn't add tests for all of the new signatures (but it may be fine to test a few of the signatures). We'd need to make the conventions explicit and enforce them in code reviews. I believe that after we've had the conventions established for some time, contributors will figure this out by looking at existing tests as examples. It will be important that we don't let poor-quality or too wide tests through code review, as it's possible (even likely) that contributors will start emulating these examples.

@hauntsaninja
Copy link
Collaborator

Now that we have a package version in METADATA.toml, we could also run stubtest. This would totally take care of 1.

@srittau
Copy link
Collaborator

srittau commented Apr 9, 2021

Both @JukkaL's and @hauntsaninja's suggestions sound good to me.

@AlexWaygood
Copy link
Member

In December 2020, @hauntsaninja wrote:

An easy and hopefully unobjectionable way to do this is to create a repo with whatever code you want to test and add the repo to mypy_primer.

I would very much like to do something like this to add some regression tests for pow, which is exceedingly difficult to type. However, as @JelleZijlstra noted, it would probably be better for these mypy_primer-based regression tests to be put inside typeshed rather than maintained as an external repo.

Some questions, therefore:

  1. Do we agree that regression tests for the thorniest cases, such as pow, would help typeshed? I agree with the consensus that regression tests on the level of the cpython repo are wholly unnecessary, but it seems to me that the corner cases of pow just don't appear often enough in open-source code for mypy_primer/stubtest to be sufficient with this function, wonderful tools though they are.
  2. Would something along these lines be an acceptable starting point for some mypy_primer-based regression tests for pow?
  3. Do we agree that it would be better to have these tests inside typeshed, rather than in a different respository?

If the answer to all of these is "yes", I would be happy to submit a PR, leading to...

  1. Where should these tests go in the typeshed repo? What should the folder be called? Etc., etc.

@JelleZijlstra
Copy link
Member

JelleZijlstra commented Nov 19, 2021

Thank you! I agree on "yes" for these questions.

I don't feel strongly about the structure or naming, but here's a proposal:

  • Create a new top-level directory test_cases/
  • Within this directory, the directory structure mirrors the top level
  • Every stub file is tested in a file prefixed with test_
  • So, for example, the pow() tests would live in test_cases/stdlib/test_builtins.py.

@AlexWaygood
Copy link
Member

One complication is that a regression tests for pow (for example) won't work unless we have the absolute strictest mypy settings applied. That includes --disallow-any-expr. For some builtins tests, however, that might not be practical -- we might want the revealed type to be Any, or it might be impossible in some cases to write a test without a type being inferred by mypy as Any. I can think of two solutions to this problem:

  1. Have two sets of tests (in two separate directories?), with different levels of strictness applied.
  2. Have one set of tests, and apply the strictest mypy settings to all tests. If we want an inferred type to be Any, we add a # type: ignore comment accordingly. Then, if we see any mypy_primer output telling us about an unused # type: ignore comment in the tests directory, we "know" internally that this signifies a potential failing test case.

Option (2) probably won't work until #6307 is fixed, however.

@Akuli
Copy link
Collaborator

Akuli commented Nov 19, 2021

Another problem is how to ensure the type matches. This doesn't really work:

def expects_float(arg: float) -> None: ...
expects_float(pow(5, -7))

If pow(5, -7) returns int, it would implicitly convert to float and cause no errors. This doesn't work either, because a list literal like [2] can be list[float] or list[int], depending on how it is used:

def expects_float(arg: list[float]) -> None: ...
expects_float([pow(5, -7)])

Maybe we should use mypy's test suite instead? It makes it possible to assert the output of reveal_type().

@AlexWaygood
Copy link
Member

This doesn't really work:

I would say it "partially works", since we are at least verifying that the inferred type is not Any or complex. But yes, I agree that this is far from an ideal way of testing these things. Being able to assert the output of reveal_type sounds much more promising, though I don't really know anything about how mypy's test suite works.

@JelleZijlstra
Copy link
Member

though I don't really know anything about how mypy's test suite works

It mostly works like this:

reveal_type(1)  # E: Revealed type is 'int'

And the framework complains if the revealed type is something different. I don't think it's easy to use that syntax outside of mypy itself, though there are apparently a few tools like pytest plugins doing that.

You can work around the float/int thing by taking advantage of list invariance. This errors for mypy:

from typing import List

def f(x: List[float]):
    pass

y = [3]
f(y)

@AlexWaygood
Copy link
Member

I think we shouldn't let the perfect be the enemy of the good. If nobody objects, I'll go ahead and draft a PR with some mypy_primer-based regression tests.

If somebody else drafts a PR proposing superior tests based on mypy's testing framework, I'll happily abandon my PR (/revert my PR if it's already merged) 🙂


You can work around the float/int thing by taking advantage of list invariance.

Very nice!

@sobolevn
Copy link
Member

If somebody else drafts a PR proposing superior tests based on mypy's testing framework

I cannot refrain from a shameless plug: https://github.com/typeddjango/pytest-mypy-plugins
It is way more advanced. It supports parametrization, regex matchers (to problems with numeric typevar ids), etc.

@AlexWaygood
Copy link
Member

I cannot refrain from a shameless plug: https://github.com/typeddjango/pytest-mypy-plugins

This looks great! If others also agree that this would be a better way to go, I'll hold off making a PR and let @sobolevn propose a PR using this framework.

@sobolevn
Copy link
Member

I'll hold off making a PR and let @sobolevn propose a PR using this framework.

I was not really paying attentions to this thread 😆
So, I just saw a keyword combination: tests + types. And here I am!

@srittau srittau removed the project: policy Organization of the typeshed project label Jan 17, 2022
@srittau srittau added the project: infrastructure typeshed build, test, documentation, or distribution related label Jan 17, 2022
@JelleZijlstra
Copy link
Member

We now have such tests: #7663.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: infrastructure typeshed build, test, documentation, or distribution related
Projects
None yet
Development

No branches or pull requests