-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) #59376
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -14,6 +14,7 @@ | |||
|
||||
from collections.abc import Callable | ||||
import os | ||||
from typing import Any | ||||
|
||||
import pandas._config.config as cf | ||||
from pandas._config.config import ( | ||||
|
@@ -455,12 +456,27 @@ def is_terminal() -> bool: | |||
``future.infer_string`` is set to True. | ||||
""" | ||||
|
||||
|
||||
def is_valid_string_storage(value: Any) -> None: | ||||
mroeschke marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
legal_values = ["python", "pyarrow"] | ||||
if value not in legal_values: | ||||
msg = "Value must be one of python|pyarrow" | ||||
if value == "pyarrow_numpy": | ||||
# TODO: we can remove extra message after 3.0 | ||||
msg += ( | ||||
". 'pyarrow_numpy' was specified, but this option should be " | ||||
"enabled using pandas.options.future.infer_string instead" | ||||
) | ||||
Comment on lines
+464
to
+469
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this custom message to the error in case someone actually did end up using |
||||
raise ValueError(msg) | ||||
|
||||
|
||||
with cf.config_prefix("mode"): | ||||
cf.register_option( | ||||
"string_storage", | ||||
"python", | ||||
string_storage_doc, | ||||
validator=is_one_of_factory(["python", "pyarrow", "pyarrow_numpy"]), | ||||
# validator=is_one_of_factory(["python", "pyarrow"]), | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer to leave this line in so it's easier to go back to that, given that this extra message is only meant to stay for 2.3, and for 3.0 we should already remove it again (because the infer_string option will be enabled by default in 3.0, so then there is no point to let users set that manually) |
||||
validator=is_valid_string_storage, | ||||
) | ||||
|
||||
|
||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -897,3 +897,12 @@ def test_astype_to_string_not_modifying_input(string_storage, val): | |
with option_context("mode.string_storage", string_storage): | ||
df.astype("string") | ||
tm.assert_frame_equal(df, expected) | ||
|
||
|
||
@pytest.mark.parametrize("val", [None, 1, 1.5, np.nan, NaT]) | ||
def test_astype_to_string_dtype_not_modifying_input(any_string_dtype, val): | ||
# GH#51073 - variant of the above test with explicit dtype instances | ||
Comment on lines
+903
to
+904
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this variant because the test above with the string_storage option and |
||
df = DataFrame({"a": ["a", "b", val]}) | ||
expected = df.copy() | ||
df.astype(any_string_dtype) | ||
tm.assert_frame_equal(df, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the
string_storage
fixture will no longer cover the "pyarrow + NaN" variant of the string dtype directly with this change, but:pandas/tests/io
submodule, which had an override of this fixture to not include "pyarrow_numpy" anyway (and I removed this override)