Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST (string dtype): fix IO dtype_backend tests for storage of str dtype of columns' Index #59509

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Aug 14, 2024

Quick follow-up on #59481 (I missed that with the last merge of main there was actually one relevant failure, maybe some interaction with one of my other merged PRs)

This was caused because of the merge of #59488, that changed to honor the mode.string_storage option for the future default string dtype. That meant that DataFrames with string column names that were created in a with pd.option_context("mode.string_storage", string_storage): context would now also follow the specified storage for the column names.

Comment on lines +473 to +478
expected = DataFrame(
{
"a": pd.array(["a", "b"], dtype=pd.StringDtype(string_storage)),
"b": pd.array(["x", pd.NA], dtype=pd.StringDtype(string_storage)),
},
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this is changing is that the expected result is also created in the with pd.option_context("mode.string_storage", string_storage): context. What this actually changes is that when future.infer_string is enabled, that affects the storage used by the Index object with str dtype used for the columns of the resulting dataframe.

@jorisvandenbossche jorisvandenbossche changed the title TST (string dtype): fix failure in csv test_dtype_backend_string test TST (string dtype): fix IO dtype_backend tests for storage of str dtype of columns' Index Aug 14, 2024
Comment on lines -715 to +717
tm.assert_frame_equal(result, expected)
# the storage of the str columns' Index is also affected by the
# string_storage setting -> ignore that for checking the result
tm.assert_frame_equal(result, expected, check_column_type=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the tests to essentially just ignore the issue.

We could also verify it explicitly (although it's not the main thing that this test is meant to test) by doing something like:

if using_infer_string:
    expected.columns = expected.columns.astype(pd.StringDtype(string_storage, na_value=np.nan))

Which, now I comment that here, is maybe also perfectly fine (not too verbose, compared to the comment I had to add :))

@mroeschke mroeschke added this to the 3.0 milestone Aug 14, 2024
@mroeschke mroeschke merged commit eba59fa into pandas-dev:main Aug 14, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the string-dtype-tests-fixup2 branch August 14, 2024 18:47
@jorisvandenbossche jorisvandenbossche modified the milestones: 3.0, 2.3 Aug 20, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…pe of columns' Index (pandas-dev#59509)

* TST (string dtype): fix failure in csv test_dtype_backend_string test

* ignore storage of Index str dtype
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants