Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrowStringArray] TST: parametrize str.extractall tests #41419

Merged
merged 1 commit into from
May 11, 2021

Conversation

simonjayhawkins
Copy link
Member

No description provided.

@simonjayhawkins simonjayhawkins added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels May 11, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 11, 2021
@jbrockmendel jbrockmendel merged commit ff134cc into pandas-dev:master May 11, 2021
@jbrockmendel
Copy link
Member

thanks @simonjayhawkins

@jbrockmendel
Copy link
Member

for future PRs of this sort, it would be somewhat easier to review if you did variable renaming (which I heartily approve of, and in fact would ask you to go one step further and rename s to ser to avoid 1-letter variable names) in a pre-cursor PR

@simonjayhawkins simonjayhawkins deleted the extractall-tests branch May 12, 2021 08:52
@simonjayhawkins
Copy link
Member Author

I think changes I may need to be made to _wrap_result for the expand path in the string accessor so this affects split/rsplit, extract/extractall, partition/rpartition and get_dummies. I want to make sure that all dtypes are tested before that change is made.

so these test parameterisation PRs are 'blockers' for the next step of the pyarrow implementation. So not keen to do precursor to the precursors, so can maybe leave out the other cleanups and just parameterise on dtype?

@simonjayhawkins
Copy link
Member Author

we should maybe discuss in today's dev meeting, but I'm getting concerned that I won't get this done for 1.3.

the rc is only a few weeks away and if it doesn't make the rc it'll have to wait for 1.4

In the original PR adding the dtype #27949 there was some discussion on why the StringDtype was even needed and the gist, as I understand it, is that it was added so that the api could be established/tested ready to swap out the memory data model in the future.

#35169 has an updated discussion where having a parameterised dtype has been proposed (but AFAICT not agreed), leading to breaking api change for users.

so far all the changes merged for ArrowStringArray have the same behaviour as StringArray (I hope). There is draft PRs, that I don't want to merge yet if a) performance using native pyarrow is slower than object fallback, or b) the behaviour of ArrowStringArray is different from StringArray. Doing this may preclude the simple swap out memory model option.

@jbrockmendel
Copy link
Member

So not keen to do precursor to the precursors, so can maybe leave out the other cleanups and just parameterise on dtype?

that would have the same effect vis-a-vis making review easier

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants