Make IndexOpsMixin (and Index) generic #760

twoertwein · 2023-08-03T17:55:55Z

Closes Create __iter__() for DatetimeIndex #723, closes Index.intersection does not return correct sub type #744, closes Bug with iterating df.columns #502, and closes Generic type for Indexes #340
Tests added: Please use assert_type() to assert the type of any return value

Mostly working, but still a few failing tests and I haven't added new tests.

…er returns

tests/test_scalars.py

twoertwein · 2023-08-03T18:13:26Z

Most of the the remaining errors caused by Index[S1] where S1 is Any/Unknown, e.g., df.columns. Even though S1 is bound, the type system/checkers widen the type to Any/Unknown.

Dr-Irv · 2023-08-03T19:16:55Z

I think this may close #502 and #340 . Can you check if that should be added to the list?

Dr-Irv

Noticed some things to consider.

Am wondering how MultiIndex will work as a subclass of Index[S1] . Is everything good because we just say class MultiIndex(Index), so the S1 is ignored? But will some of the methods of Index that refer to S1 still work right if the index is a MultiIndex ?

pandas-stubs/core/frame.pyi

pandas-stubs/core/indexes/base.pyi

tests/test_indexes.py

twoertwein · 2023-08-03T19:44:00Z

Am wondering how MultiIndex will work as a subclass of Index[S1] . Is everything good because we just say class MultiIndex(Index), so the S1 is ignored? But will some of the methods of Index that refer to S1 still work right if the index is a MultiIndex ?

I'm not sure, same applies to CategoricalIndex (pd.Categorical is not in S1, but probably should be there).

edit: Leaving CategoricalIndex as-is might not be too bad as CategoricalIndex([...])[0] is the underlying type and not pd.Categorical

…..]] but keeping it as Index[Any]

Dr-Irv · 2023-08-03T22:43:38Z

@twoertwein I'm headed out of town for 3 days, so won't look at this until Monday.

This reverts commit 4929ecb.

…bclasses in parent classes)

… methods that aren't already provided by object

pandas-stubs/core/accessor.pyi

pandas-stubs/core/computation/ops.pyi

pandas-stubs/core/indexes/base.pyi

Dr-Irv

This is looking pretty good, IMHO

pandas-stubs/core/indexes/base.pyi

pandas-stubs/core/indexes/multi.pyi

pandas-stubs/core/indexes/timedeltas.pyi

Dr-Irv · 2023-08-08T11:58:07Z

tests/test_frame.py

@@ -1761,7 +1761,8 @@ def test_getmultiindex_columns() -> None:
        [(i, s) for i in [1] for s in df.columns.get_level_values(1)]
    ]
    res4: pd.DataFrame = df[[df.columns[0]]]
-    check(assert_type(df[df.columns[0]], pd.Series), pd.Series)
+    column: Scalar = df.columns[0]
+    check(assert_type(df[column], pd.Series), pd.Series)


This would be annoying if one can't write df[df.columns[0]] without doing what you have done here.

I'm not sure if this can be fixed - maybe we can do some gymnastic with the order of overloads. df.columns is Index[Any] and would therefore probably match the first overload.

Given that both mypy and pyright do not limit a bounded TypeVar to its bound when it is unknown/any, we have only two options:

make people cast the Index to the appropriate type

return Scalar

return Scalar now; in the future, make pd.DataFrame generic in terms of the Index; return S1

I have no particular preference, except that making DataFrame generic might not happen anytime soon

btw. I will offline from Friday-Sunday, you are welcome to push changes to this PR or also merge it. Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.

unrelated: seems that the newest version of numexp broke CI/pandas

Given that both mypy and pyright do not limit a bounded TypeVar to its bound when it is unknown/any, we have only two options:

make people cast the Index to the appropriate type

return Scalar

return Scalar now; in the future, make pd.DataFrame generic in terms of the Index; return S1

I have no particular preference, except that making DataFrame generic might not happen anytime soon

Would changing DataFrame.columns() to return Index[Scalar] work?

btw. I will offline from Friday-Sunday, you are welcome to push changes to this PR or also merge it. Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.

I will see if I can give this a try.

Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.

I tried the version I placed in your repo with the PR on two large code bases and no new errors appeared, so I think once you merge that in, I can approve and merge in this PR.

Thank you for testing it and finding a way to let mypy (and pyright) clearly indicate the unintended calls!

tests/test_indexes.py

Dr-Irv

Can you test if using DataFrame.columns() returning Index[Scalar] would solve the problem noted in another comment so that the expression df[df.columns[0]] would work without having to do a cast or creating a temporary variable?

Dr-Irv · 2023-08-09T12:14:22Z

tests/test_scalars.py

+        assert_type(
+            md_int64_index // td, Never  # pyright: ignore[reportGeneralTypeIssues]
+        )
+        assert_type(
+            md_float_index // td, Never  # pyright: ignore[reportGeneralTypeIssues]
+        )


Concerned about the above change. Previously, mypy was seeing that this is an invalid operation. If you make the above change, mypy is no longer detecting that. That's not a good thing.

I agree, I don't know how to fix it. I think the reason why it worked previously was a behavior of Never when it is being used on input arguments: it is interpreted by mypy/pyright to indicate that that call is invalid. Now, with the generic class, we need to have annotations on the input arguments (we don't have the luxury of using Never there anymore). We can still return Never (but that has a slightly different semantic meaning).

I found a fix. I created a PR against your branch at twoertwein#3

Key was to remove __floordiv__() from OpsMixin and let any subclass of OpsMixin define only the valid values.

I think it's a mypy bug. See python/mypy#15861

So the mypy folks say it's not a bug, and pyright says that its implementation was incorrect. So I think the only way to do this is with what I did - you can't use Never or NoReturn to "override" the implementation in a subclass that matches Any

Dr-Irv · 2023-08-09T12:14:45Z

tests/test_scalars.py

+        assert_type(
+            md_int64_index / td, Never  # pyright: ignore[reportGeneralTypeIssues]
+        )
+        assert_type(
+            md_float_index / td, Never  # pyright: ignore[reportGeneralTypeIssues]
+        )


Same as above

twoertwein · 2023-08-09T13:36:50Z

Can you test if using DataFrame.columns() returning Index[Scalar] would solve the problem noted in another comment so that the expression df[df.columns[0]] would work without having to do a cast or creating a temporary variable?

It sounds as if it should work but it creates different problems:

Scalar is a Union: using df.columns[0] for comparisons or other operations would fail (we have a few tests for that - they work because it is currently Any)
Scalar is actually not compatible with S1 (I believe np.tiemdelta64 and np.datetime64 are missing in S1)
it makes columns: Index[str] = df.columns not possible anymore Bug with iterating df.columns #502

I try to see how far I get on my local branch but I'm not too optimistic that there is an ideal solution for this problem (except for DataFrame being generic in index and columns)

edit: columns -> Index[str] is a lie but It passes the tests and is probably also true for most people

Dr-Irv · 2023-08-09T14:48:27Z

edit: columns -> Index[str] is a lie but It passes the tests and is probably also true for most people

Yes, I agree. For the cases where that was incorrect, then you have to do a cast, and if we make the type wider, then you always have to do a cast, even in the most common case.

Dr-Irv · 2023-08-09T14:49:22Z

I will see if I can find some time to test that one issue related to why mypy isn't picking up the incorrect arithmetic operation and to test with some code bases I have. Once done, I'll approve and merge.

twoertwein · 2023-08-09T23:22:52Z

I will see if I can find some time to test that one issue related to why mypy isn't picking up the incorrect arithmetic operation

I believe you have an open issue at mypy related to that :) In theory, mypy should still warn about unreachable code after those lines and prevent people from using the return value.

…tects it as invalid

Dr-Irv

Suggestion to avoid using npt in the constructors

pandas-stubs/core/indexes/base.pyi

fix issue with floordiv for mypy

Dr-Irv

Thanks @twoertwein . Great contribution

twoertwein added 4 commits August 3, 2023 08:16

interpolate method

4929ecb

WIP

2775a58

mypy handles Never differently - at least asssert that the method nev…

9e40ddf

…er returns

use Self

68abe4d

twoertwein mentioned this pull request Aug 3, 2023

Create __iter__() for DatetimeIndex #723

Closed

twoertwein commented Aug 3, 2023

View reviewed changes

tests/test_scalars.py Outdated Show resolved Hide resolved

twoertwein added 2 commits August 3, 2023 14:30

a few more S1/Self

2cfbdb0

fix tests

ebcc389

Dr-Irv reviewed Aug 3, 2023

View reviewed changes

pandas-stubs/core/frame.pyi Outdated Show resolved Hide resolved

pandas-stubs/core/indexes/base.pyi Show resolved Hide resolved

pandas-stubs/core/indexes/base.pyi Show resolved Hide resolved

tests/test_indexes.py Show resolved Hide resolved

more tests

24d42fa

twoertwein mentioned this pull request Aug 3, 2023

Generic type for Indexes #340

Closed

twoertwein added 3 commits August 3, 2023 16:09

add an ignore to let mypy pass

f3802d3

compat with old python versions

92eab41

fix CategoricalIndex; MultiIndex should probably be Index[tuple[S1, .…

6e04d8e

…..]] but keeping it as Index[Any]

twoertwein added 4 commits August 3, 2023 19:32

Revert "interpolate method"

e7a2d3b

This reverts commit 4929ecb.

many more overloads for subclasses (I wish pandas would not handle su…

d12a72a

…bclasses in parent classes)

remove PandasObject - it caused the pyright issues but it provides no…

87c62d0

… methods that aren't already provided by object

works (except for np.ndarray)

9ce5fc4

twoertwein commented Aug 4, 2023

View reviewed changes

pandas-stubs/core/accessor.pyi Show resolved Hide resolved

twoertwein commented Aug 4, 2023

View reviewed changes

pandas-stubs/core/computation/ops.pyi Show resolved Hide resolved

twoertwein commented Aug 4, 2023

View reviewed changes

pandas-stubs/core/indexes/base.pyi Outdated Show resolved Hide resolved

Dr-Irv reviewed Aug 8, 2023

View reviewed changes

twoertwein added 4 commits August 8, 2023 09:33

Merge remote-tracking branch 'upstream/main' into iter_interpolate

05b9c15

address the easy-to-fix comments

d32f534

overloads for numpy

4f13109

did numexpr break the CI?

bd39625

twoertwein marked this pull request as ready for review August 9, 2023 00:46

Dr-Irv reviewed Aug 9, 2023

View reviewed changes

twoertwein added 2 commits August 9, 2023 10:01

lie: df.columns -> Index[str]

8c48567

new ruff has far more pyi rules

67f5e8d

Make it clear that both mypy&pyright infer Never; but only pyright de…

566c918

…tects it as invalid

Dr-Irv requested changes Aug 10, 2023

View reviewed changes

pandas-stubs/core/indexes/base.pyi Outdated Show resolved Hide resolved

pandas-stubs/core/indexes/base.pyi Outdated Show resolved Hide resolved

pandas-stubs/core/indexes/base.pyi Outdated Show resolved Hide resolved

twoertwein and others added 4 commits August 10, 2023 10:32

use type aliases instead of npt

ff5ee58

fix issue with floordiv for mypy

c8d2019

fix comment in arraylike

585c93b

Merge pull request #3 from Dr-Irv/pr760

7ac1ea5

fix issue with floordiv for mypy

Dr-Irv approved these changes Aug 14, 2023

View reviewed changes

Dr-Irv merged commit f7621f4 into pandas-dev:main Aug 14, 2023
13 checks passed

twoertwein deleted the iter_interpolate branch August 14, 2023 17:08

twoertwein mentioned this pull request Aug 16, 2023

TYP: Add DatetimeIndex.intersection() method and test #745

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make IndexOpsMixin (and Index) generic #760

Make IndexOpsMixin (and Index) generic #760

twoertwein commented Aug 3, 2023 •

edited

Loading

twoertwein commented Aug 3, 2023

Dr-Irv commented Aug 3, 2023

Dr-Irv left a comment

twoertwein commented Aug 3, 2023 •

edited

Loading

Dr-Irv commented Aug 3, 2023

Dr-Irv left a comment

Dr-Irv Aug 8, 2023

twoertwein Aug 8, 2023

twoertwein Aug 9, 2023

twoertwein Aug 9, 2023

Dr-Irv Aug 9, 2023

Dr-Irv Aug 13, 2023

twoertwein Aug 14, 2023

Dr-Irv left a comment

Dr-Irv Aug 9, 2023

twoertwein Aug 9, 2023

Dr-Irv Aug 13, 2023 •

edited

Loading

Dr-Irv Aug 13, 2023

Dr-Irv Aug 9, 2023

twoertwein commented Aug 9, 2023 •

edited

Loading

Dr-Irv commented Aug 9, 2023

Dr-Irv commented Aug 9, 2023

twoertwein commented Aug 9, 2023

Dr-Irv left a comment

Dr-Irv left a comment

Make IndexOpsMixin (and Index) generic #760

Make IndexOpsMixin (and Index) generic #760

Conversation

twoertwein commented Aug 3, 2023 • edited Loading

twoertwein commented Aug 3, 2023

Dr-Irv commented Aug 3, 2023

Dr-Irv left a comment

Choose a reason for hiding this comment

twoertwein commented Aug 3, 2023 • edited Loading

Dr-Irv commented Aug 3, 2023

Dr-Irv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv Aug 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twoertwein commented Aug 9, 2023 • edited Loading

Dr-Irv commented Aug 9, 2023

Dr-Irv commented Aug 9, 2023

twoertwein commented Aug 9, 2023

Dr-Irv left a comment

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment

twoertwein commented Aug 3, 2023 •

edited

Loading

twoertwein commented Aug 3, 2023 •

edited

Loading

Dr-Irv Aug 13, 2023 •

edited

Loading

twoertwein commented Aug 9, 2023 •

edited

Loading