Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.columns.str.match actually gives npt.NDArray[np.bool_], but mypy thinks it is pd.Index[str] #983

Closed
cmp0xff opened this issue Aug 21, 2024 · 1 comment · Fixed by #990
Labels
Strings String extension data type and string data

Comments

@cmp0xff
Copy link
Contributor

cmp0xff commented Aug 21, 2024

Describe the bug

df.columns.str.match(reg_ex_pattern) actually gives npt.NDArray[np.bool_], but mypy thinks it's pd.Index[str]

To Reproduce

Provide a minimal runnable pandas example that is not properly checked by the stubs.

from typing import TYPE_CHECKING, cast

import numpy as np
import pandas as pd

if TYPE_CHECKING:
    from numpy import typing as npt


df = pd.DataFrame({"1": [2, 3], "2": 3, "4": 5, "a": 1})
mask = df.columns.str.match(r"\d")
print(mask)  # array([ True,  True,  True, False])
print(type(mask))  # <class 'numpy.ndarray'>
df.loc[:, mask]  # mypy: error: Invalid index type "tuple[slice, Index[str]]" for "_LocIndexerFrame"; expected type "slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | <6 more items>"
df.loc[:, cast("npt.NDArray[np.bool_]", mask)]  # mypy: fine

Indicate which type checker you are using (mypy or pyright).

I am using mypy.

Show the error message received from that type checker while checking your example.

error: Invalid index type "tuple[slice, Index[str]]" for "_LocIndexerFrame"; expected type "slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | <6 more items>"

Please complete the following information:

  • OS: Windows
  • OS Version:
    cmd /c ver  # Microsoft Windows [Version 10.0.19045.4651]
  • python version
    poetry run python --version  # Python 3.12.5
  • version of type checker
    poetry run mypy --version  # mypy 1.11.1 (compiled: yes)
  • version of installed pandas-stubs
    poetry run pip freeze | findstr pandas-stubs  # pandas-stubs==2.2.2.240603

Additional context

Nothing

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Aug 21, 2024

Thanks for the report. The declaration of match() in core/strings.pyi is incorrect. But to fix it, the StringMethods class will need an additional argument to pass in the expected result of match(), similar to what is done with str.split().

PR with tests welcome

@Dr-Irv Dr-Irv added the Strings String extension data type and string data label Aug 21, 2024
cmp0xff added a commit to cmp0xff/pandas-stubs that referenced this issue Sep 1, 2024
cmp0xff added a commit to cmp0xff/pandas-stubs that referenced this issue Sep 1, 2024
cmp0xff added a commit to cmp0xff/pandas-stubs that referenced this issue Sep 4, 2024
cmp0xff added a commit to cmp0xff/pandas-stubs that referenced this issue Sep 4, 2024
Dr-Irv pushed a commit that referenced this issue Sep 4, 2024
* fix(typing): #983 return type of StringMethods.match

* feat(string): #983 tests for the fix

* feat: #983 new TypeVar, following #990 (comment)

* fix(comment): #983 use `np_ndarray_bool` from `pandas._typing` in stubs @Dr-Irv #990 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants