Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix re.split(...) return type #5269

Merged
merged 2 commits into from
Apr 30, 2021
Merged

Fix re.split(...) return type #5269

merged 2 commits into from
Apr 30, 2021

Conversation

tusharsadhwani
Copy link
Contributor

Example:

re.split(r' (?=(x)*.)', 'a b') # ['a', None, 'b']

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@Akuli
Copy link
Collaborator

Akuli commented Apr 30, 2021

This change breaks a lot of code that assumes the list doesn't contain Nones. In most cases, it can't contain Nones, because the regex used doesn't contain optional groups.

We have some options:

  • Change the return type to List[Optional[str]] as in this PR. This breaks a lot of code, and fixing it requires annoying assert thing is not None checks or similar.
  • Change the return type to List[Any]. Might hide bugs that type checkers could otherwise find.
  • Do nothing. This seems like the best option to me.

What do you think?

@JelleZijlstra
Copy link
Member

This can also be implemented more precisely in a type checker plugin that could look at the regex to see if there are optional groups.

@tusharsadhwani
Copy link
Contributor Author

I guess so.

I was just startled when mypy assured me that split returns list[str] but my code gave me Nones. However annoying it might be in the former case, the latter case is misleading... so I don't know what's right and what's not.

@Akuli
Copy link
Collaborator

Akuli commented Apr 30, 2021

There's not a good way to express this in stubs. If you want, you can create a feature request to mypy.

@srittau
Copy link
Collaborator

srittau commented Apr 30, 2021

See #3902 for discussion about match.group(). I think the best solution would be to write at least a mypy plugin that overrides the stubs and then fix the stubs to match the implementation.

@JukkaL
Copy link
Contributor

JukkaL commented Apr 30, 2021

Another option is to use List[Union[AnyStr, Any]]. This may be a bit non-intuitive, but it means that the value can be str/bytes or something else. This provides most of the type checking precision we had before, but it's also more correct, as the Any type indicates that other values are possible. It also wouldn't break existing code, and it would be better for mypyc and other tools that do strict runtime checking based on types.

Type checkers can still optionally special case this based on the regular expression string and provide more precise types when possible.

@tusharsadhwani
Copy link
Contributor Author

Yeah that sounds nice.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@tusharsadhwani tusharsadhwani changed the title re.split returns List[Optional[AnyStr]] Fix re.split(...) return type Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants