-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#1503: Proper implementation of Series.values
#5469
Conversation
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
modin/pandas/series.py
Outdated
if not is_numeric_dtype(self.dtype): | ||
return self._default_to_pandas("values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these non-numeric dtypes whose behavior deviates from a common .to_numpy()
? If it's only categories can we then just apply the same approach as for .ravel()
?:
Lines 1502 to 1505 in 3e314d8
if isinstance(self.dtype, pandas.CategoricalDtype): | |
data = pandas.Categorical(data, dtype=self.dtype) | |
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code snippet from pandas:
Overview:
dtype | values | _values | array |
----------- | ------------- | ------------- | ------------- |
Numeric | ndarray | ndarray | PandasArray |
Category | Categorical | Categorical | Categorical |
dt64[ns] | ndarray[M8ns] | DatetimeArray | DatetimeArray |
dt64[ns tz] | ndarray[M8ns] | DatetimeArray | DatetimeArray |
td64[ns] | ndarray[m8ns] | TimedeltaArray| ndarray[m8ns] |
Period | ndarray[obj] | PeriodArray | PeriodArray |
Nullable | EA | EA | EA |
There seems to be a case with the Nullable
type where EA (extended array?) would be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EA = ExtensionArray
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Series.values
Series.values
@anmyachev, please check #4187 if this PR resolves it or maybe partitally. |
|
data = self.to_numpy() | ||
if isinstance(self.dtype, pd.CategoricalDtype): | ||
data = pd.Categorical(data, dtype=self.dtype) | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still going to mess up a bunch of EA cases. The correct way to handle this is #4187 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I will continue to work on this separately. At the moment, Modin doesn't have a internal interface to implement your suggestion, I've tried that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel Is there some simple condition by which we can determine what this is EA case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(self.dtype, ExtensionDtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jbrockmendel! Should be fixed in #5493.
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com> Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev anatoly.myachev@intel.com
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date