Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly populate the class dict for cudf.pandas proxy types #14534

Merged
merged 95 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from 89 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
dba3586
Fix accessor registration for proxy types
shwina Nov 16, 2023
bd69575
Initial stab at eager dict eval
shwina Nov 20, 2023
16b9340
Include properties and cached_properties
shwina Nov 21, 2023
8953944
Merge branch 'branch-23.12' of github.com:rapidsai/cudf into eagerly-…
shwina Nov 21, 2023
3a27c54
make some more progress but style checks are broken
shwina Nov 21, 2023
19e3038
Merge branch 'branch-24.02' of github.com:rapidsai/cudf into eagerly-…
shwina Nov 28, 2023
cef9adc
Restore meta properties
shwina Nov 30, 2023
73e714a
Merge branch 'branch-24.02' of github.com:rapidsai/cudf into eagerly-…
shwina Dec 1, 2023
483872a
Progress
shwina Dec 1, 2023
1066c2b
Progress
shwina Dec 1, 2023
68df373
Fix handling doc/dir
shwina Dec 2, 2023
cdef427
Private attrs
shwina Dec 3, 2023
8c14c0c
Merge branch 'branch-24.02' of github.com:rapidsai/cudf into eagerly-…
shwina Dec 11, 2023
fef2768
Merge branch 'branch-24.02' of github.com:rapidsai/cudf into eagerly-…
shwina Dec 12, 2023
b810679
Merge branch 'branch-24.02' of github.com:rapidsai/cudf into eagerly-…
shwina Dec 14, 2023
d0f094b
Profiler changes
shwina Dec 28, 2023
7ef0757
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Feb 22, 2024
5d53462
Style
shwina Feb 22, 2024
de5cb7a
Style
shwina Feb 22, 2024
4d928eb
Use qualname
shwina Feb 23, 2024
a8f3222
importorskip
shwina Feb 23, 2024
8e9eaad
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Feb 23, 2024
180228d
Proxy underscore attributes too
shwina Feb 23, 2024
fbefc7f
Try handling private attrs
shwina Feb 26, 2024
eba403c
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Feb 26, 2024
16a0f21
Intermediates too
shwina Feb 26, 2024
764bd72
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Feb 29, 2024
a29df18
Merge branch 'branch-24.04' into eagerly-populate-class-dict
shwina Mar 12, 2024
3290cc9
Merge branch 'branch-24.04' into eagerly-populate-class-dict
galipremsagar Mar 12, 2024
fd6adce
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Mar 12, 2024
2f4fdb6
Merge branch 'branch-24.04' into eagerly-populate-class-dict
galipremsagar Mar 13, 2024
964d95f
Merge branch 'branch-24.04' of github.com:rapidsai/cudf into eagerly-…
shwina Mar 25, 2024
1923bda
Add a test for accessing base class attributes via super()
shwina Mar 25, 2024
ef5784c
Merge branch 'eagerly-populate-class-dict' of github.com:shwina/cudf …
shwina Mar 25, 2024
f6b6d4a
Merge branch 'branch-24.04' into eagerly-populate-class-dict
shwina Apr 3, 2024
f56deb9
Merge branch 'branch-24.06' of github.com:rapidsai/cudf into eagerly-…
shwina Apr 3, 2024
917a0b5
Merge branch 'eagerly-populate-class-dict' of github.com:shwina/cudf …
shwina Apr 3, 2024
3d3ff0a
Remove slow IPython note
shwina Apr 3, 2024
c95289a
Merge branch 'branch-24.06' into eagerly-populate-class-dict
shwina Apr 8, 2024
16a647d
Merge branch 'branch-24.06' into eagerly-populate-class-dict
mroeschke Apr 9, 2024
9891545
Merge branch 'branch-24.06' into eagerly-populate-class-dict
mroeschke Apr 11, 2024
e3dc345
Update run-pandas-tests.sh
galipremsagar Apr 12, 2024
c2ca0c4
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 12, 2024
a078724
Update run-pandas-tests.sh
galipremsagar Apr 12, 2024
fa542bb
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 12, 2024
9526807
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 15, 2024
e5bdfd2
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 16, 2024
624c71e
Merge branch 'branch-24.06' into eagerly-populate-class-dict
vyasr Apr 16, 2024
442593a
ignore tests
galipremsagar Apr 17, 2024
bee2c92
Merge branch 'eagerly-populate-class-dict' of https://github.com/shwi…
galipremsagar Apr 17, 2024
e3f7393
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 17, 2024
2e38b7d
Update run-pandas-tests.sh
galipremsagar Apr 17, 2024
30e9b59
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 17, 2024
616b206
Update run-pandas-tests.sh
galipremsagar Apr 17, 2024
169148b
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 17, 2024
840c34e
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 18, 2024
979926b
ignore 1 more test
galipremsagar Apr 18, 2024
e79568e
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 19, 2024
106ea90
Fix accessing attributes created after instantiation
galipremsagar Apr 19, 2024
761ab6d
merge
galipremsagar Apr 19, 2024
9d7bea2
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 20, 2024
6b70dd3
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 22, 2024
c32c437
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 23, 2024
e4dd410
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar Apr 23, 2024
64905ad
Handle bound methods by not bounding them again
galipremsagar Apr 24, 2024
01e5efa
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 24, 2024
a8969de
Merge branch 'eagerly-populate-class-dict' of https://github.com/shwi…
galipremsagar Apr 24, 2024
1fe2627
Another round of fixes for groupby
galipremsagar Apr 26, 2024
c9c5e65
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 26, 2024
a8220b9
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 26, 2024
f31f9bd
Fix ops
galipremsagar Apr 29, 2024
0e7c843
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 29, 2024
b37dd00
Return NotImplemented for missing attributes
galipremsagar Apr 30, 2024
040fdb6
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 30, 2024
fa4367c
Another round of fixes
galipremsagar Apr 30, 2024
13f9e48
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar Apr 30, 2024
9495db3
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 1, 2024
de4d5ec
Add isub, iadd and __new__ for Timestamp and Timedelta
galipremsagar May 2, 2024
5cbfcfa
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 2, 2024
c9126ed
Fix __contains__, enable Holidays, Fix get_indexer
galipremsagar May 3, 2024
aa03e09
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 3, 2024
2399eb2
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 3, 2024
9e580e7
Add NumpyExtensionArray
galipremsagar May 4, 2024
0b5ef86
Merge
galipremsagar May 16, 2024
9ac1363
Revert my changes
galipremsagar May 16, 2024
fe7cb14
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 16, 2024
1b91665
update name
galipremsagar May 16, 2024
61034bd
undo ignore
galipremsagar May 16, 2024
db4d356
Update ci/cudf_pandas_scripts/pandas-tests/run.sh
galipremsagar May 16, 2024
f6a7042
Update python/cudf/cudf/pandas/scripts/run-pandas-tests.sh
galipremsagar May 16, 2024
9c9dc95
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar May 16, 2024
8488a02
Merge remote-tracking branch 'upstream/branch-24.06' into eagerly-pop…
galipremsagar May 16, 2024
129cd81
Make attributes private
galipremsagar May 16, 2024
c6914fd
Apply suggestions from code review
galipremsagar May 17, 2024
925374e
Merge branch 'branch-24.06' into eagerly-populate-class-dict
galipremsagar May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions docs/cudf/source/cudf_pandas/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,15 +151,3 @@ for testing or benchmarking purposes. To do so, set the
```bash
CUDF_PANDAS_FALLBACK_MODE=1 python -m cudf.pandas some_script.py
```

## Slow tab completion in IPython?

You may experience slow tab completion when inspecting the
methods/attributes of large dataframes. We expect this issue to be
resolved in an upcoming release. In the mean time, you may execute the
following command in IPython before loading `cudf.pandas` to work
around the issue:

```
%config IPCompleter.jedi_compute_type_timeout=0
```
8 changes: 4 additions & 4 deletions python/cudf/cudf/pandas/_wrappers/common.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: Copyright (c) 2023-2024, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0

Expand All @@ -17,9 +17,9 @@ def array_method(self: _FastSlowProxy, *args, **kwargs):

def array_function_method(self, func, types, args, kwargs):
try:
return _FastSlowAttribute("__array_function__").__get__(self)(
func, types, args, kwargs
)
return _FastSlowAttribute("__array_function__").__get__(
self, type(self)
)(func, types, args, kwargs)
except Exception:
# if something went wrong with __array_function__ we
# attempt to call the function directly on the slow
Expand Down
4 changes: 3 additions & 1 deletion python/cudf/cudf/pandas/_wrappers/numpy.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: Copyright (c) 2023-2024, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0

Expand All @@ -10,6 +10,7 @@
import numpy.core.multiarray

from ..fast_slow_proxy import (
_FastSlowAttribute,
make_final_proxy_type,
make_intermediate_proxy_type,
)
Expand Down Expand Up @@ -122,6 +123,7 @@ def wrap_ndarray(cls, arr: cupy.ndarray | numpy.ndarray, constructor):
"__iter__": custom_iter,
# Special wrapping to handle scalar values
"_fsproxy_wrap": classmethod(wrap_ndarray),
"base": _FastSlowAttribute("base", True),
},
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
)

Expand Down
117 changes: 100 additions & 17 deletions python/cudf/cudf/pandas/_wrappers/pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,16 @@ class _AccessorAttr:
"""

def __init__(self, typ):
self.__typ = typ
self._typ = typ

def __set_name__(self, owner, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL a new part of the descriptors object model!

https://docs.python.org/3/howto/descriptor.html#closing-thoughts

self._name = name

def __get__(self, obj, cls=None):
if obj is None:
return self.__typ
return self._typ
else:
# allow __getattr__ to handle this
raise AttributeError()
return _FastSlowAttribute(self._name).__get__(obj, type(obj))


def Timestamp_Timedelta__new__(cls, *args, **kwargs):
Expand Down Expand Up @@ -214,6 +216,7 @@ def _DataFrame__dir__(self):
"__dir__": _DataFrame__dir__,
"_constructor": _FastSlowAttribute("_constructor"),
"_constructor_sliced": _FastSlowAttribute("_constructor_sliced"),
"_accessors": set(),
},
)

Expand All @@ -236,6 +239,7 @@ def _DataFrame__dir__(self):
"cat": _AccessorAttr(_CategoricalAccessor),
"_constructor": _FastSlowAttribute("_constructor"),
"_constructor_expanddim": _FastSlowAttribute("_constructor_expanddim"),
"_accessors": set(),
},
)

Expand Down Expand Up @@ -273,6 +277,9 @@ def Index__new__(cls, *args, **kwargs):
"__new__": Index__new__,
"_constructor": _FastSlowAttribute("_constructor"),
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
"_accessors": set(),
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

Expand Down Expand Up @@ -337,7 +344,11 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=lambda fast: fast.to_pandas(),
slow_to_fast=cudf.from_pandas,
bases=(Index,),
additional_attributes={"__init__": _DELETE},
additional_attributes={
"__init__": _DELETE,
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

DatetimeArray = make_final_proxy_type(
Expand All @@ -346,6 +357,10 @@ def Index__new__(cls, *args, **kwargs):
pd.arrays.DatetimeArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

DatetimeTZDtype = make_final_proxy_type(
Expand All @@ -364,7 +379,11 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=lambda fast: fast.to_pandas(),
slow_to_fast=cudf.from_pandas,
bases=(Index,),
additional_attributes={"__init__": _DELETE},
additional_attributes={
"__init__": _DELETE,
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

NumpyExtensionArray = make_final_proxy_type(
Expand All @@ -385,6 +404,10 @@ def Index__new__(cls, *args, **kwargs):
pd.arrays.TimedeltaArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

PeriodIndex = make_final_proxy_type(
Expand All @@ -394,7 +417,11 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
bases=(Index,),
additional_attributes={"__init__": _DELETE},
additional_attributes={
"__init__": _DELETE,
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

PeriodArray = make_final_proxy_type(
Expand All @@ -403,6 +430,11 @@ def Index__new__(cls, *args, **kwargs):
pd.arrays.PeriodArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
},
)

PeriodDtype = make_final_proxy_type(
Expand Down Expand Up @@ -464,6 +496,10 @@ def Index__new__(cls, *args, **kwargs):
pd.arrays.StringArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

StringDtype = make_final_proxy_type(
Expand All @@ -472,7 +508,10 @@ def Index__new__(cls, *args, **kwargs):
pd.StringDtype,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={"__hash__": _FastSlowAttribute("__hash__")},
additional_attributes={
"__hash__": _FastSlowAttribute("__hash__"),
"storage": _FastSlowAttribute("storage"),
},
)

BooleanArray = make_final_proxy_type(
Expand All @@ -482,7 +521,9 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__")
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
},
)

Expand All @@ -502,7 +543,9 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__")
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

Expand Down Expand Up @@ -586,7 +629,11 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=lambda fast: fast.to_pandas(),
slow_to_fast=cudf.from_pandas,
bases=(Index,),
additional_attributes={"__init__": _DELETE},
additional_attributes={
"__init__": _DELETE,
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

IntervalArray = make_final_proxy_type(
Expand All @@ -595,6 +642,10 @@ def Index__new__(cls, *args, **kwargs):
pd.arrays.IntervalArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

IntervalDtype = make_final_proxy_type(
Expand Down Expand Up @@ -622,7 +673,9 @@ def Index__new__(cls, *args, **kwargs):
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__")
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
"_data": _FastSlowAttribute("_data"),
"_mask": _FastSlowAttribute("_mask"),
},
)

Expand Down Expand Up @@ -798,6 +851,14 @@ def Index__new__(cls, *args, **kwargs):
pd_Styler,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
additional_attributes={
"css": _FastSlowAttribute("css"),
"ctx": _FastSlowAttribute("ctx"),
"index": _FastSlowAttribute("ctx"),
"data": _FastSlowAttribute("data"),
"_display_funcs": _FastSlowAttribute("_display_funcs"),
"table_styles": _FastSlowAttribute("table_styles"),
},
)
except ImportError:
# Styler requires Jinja to be installed
Expand All @@ -813,7 +874,7 @@ def _get_eval_locals_and_globals(level, local_dict=None, global_dict=None):
return local_dict, global_dict


@register_proxy_func(pd.eval)
@register_proxy_func(pd.core.computation.eval.eval)
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
@nvtx.annotate(
"CUDF_PANDAS_EVAL",
color=_CUDF_PANDAS_NVTX_COLORS["EXECUTE_SLOW"],
Expand Down Expand Up @@ -843,6 +904,24 @@ def _eval(
)


_orig_df_eval_method = DataFrame.eval


@register_proxy_func(pd.core.accessor.register_dataframe_accessor)
def _register_dataframe_accessor(name):
return pd.core.accessor._register_accessor(name, DataFrame)


@register_proxy_func(pd.core.accessor.register_series_accessor)
def _register_series_accessor(name):
return pd.core.accessor._register_accessor(name, Series)


@register_proxy_func(pd.core.accessor.register_index_accessor)
def _register_index_accessor(name):
return pd.core.accessor._register_accessor(name, Index)


@nvtx.annotate(
"CUDF_PANDAS_DATAFRAME_EVAL",
color=_CUDF_PANDAS_NVTX_COLORS["EXECUTE_SLOW"],
Expand All @@ -853,11 +932,14 @@ def _df_eval_method(self, *args, local_dict=None, global_dict=None, **kwargs):
local_dict, global_dict = _get_eval_locals_and_globals(
level, local_dict, global_dict
)
return super(type(self), self).__getattr__("eval")(
*args, local_dict=local_dict, global_dict=global_dict, **kwargs
return _orig_df_eval_method(
self, *args, local_dict=local_dict, global_dict=global_dict, **kwargs
)


_orig_query_eval_method = DataFrame.query


@nvtx.annotate(
"CUDF_PANDAS_DATAFRAME_QUERY",
color=_CUDF_PANDAS_NVTX_COLORS["EXECUTE_SLOW"],
Expand All @@ -870,8 +952,8 @@ def _df_query_method(self, *args, local_dict=None, global_dict=None, **kwargs):
local_dict, global_dict = _get_eval_locals_and_globals(
level, local_dict, global_dict
)
return super(type(self), self).__getattr__("query")(
*args, local_dict=local_dict, global_dict=global_dict, **kwargs
return _orig_query_eval_method(
self, *args, local_dict=local_dict, global_dict=global_dict, **kwargs
)


Expand Down Expand Up @@ -1277,6 +1359,7 @@ def holiday_calendar_factory_wrapper(*args, **kwargs):
additional_attributes={"__hash__": _FastSlowAttribute("__hash__")},
)


MonthBegin = make_final_proxy_type(
"MonthBegin",
_Unusable,
Expand Down
Loading
Loading