[ArrowStringArray] TST: parametrize str.extractall tests #41419

simonjayhawkins · 2021-05-11T14:01:24Z

No description provided.

jbrockmendel · 2021-05-11T21:50:54Z

jbrockmendel · 2021-05-11T21:52:05Z

for future PRs of this sort, it would be somewhat easier to review if you did variable renaming (which I heartily approve of, and in fact would ask you to go one step further and rename s to ser to avoid 1-letter variable names) in a pre-cursor PR

simonjayhawkins · 2021-05-12T09:21:56Z

I think changes I may need to be made to _wrap_result for the expand path in the string accessor so this affects split/rsplit, extract/extractall, partition/rpartition and get_dummies. I want to make sure that all dtypes are tested before that change is made.

so these test parameterisation PRs are 'blockers' for the next step of the pyarrow implementation. So not keen to do precursor to the precursors, so can maybe leave out the other cleanups and just parameterise on dtype?

simonjayhawkins · 2021-05-12T09:41:32Z

we should maybe discuss in today's dev meeting, but I'm getting concerned that I won't get this done for 1.3.

the rc is only a few weeks away and if it doesn't make the rc it'll have to wait for 1.4

In the original PR adding the dtype #27949 there was some discussion on why the StringDtype was even needed and the gist, as I understand it, is that it was added so that the api could be established/tested ready to swap out the memory data model in the future.

#35169 has an updated discussion where having a parameterised dtype has been proposed (but AFAICT not agreed), leading to breaking api change for users.

so far all the changes merged for ArrowStringArray have the same behaviour as StringArray (I hope). There is draft PRs, that I don't want to merge yet if a) performance using native pyarrow is slower than object fallback, or b) the behaviour of ArrowStringArray is different from StringArray. Doing this may preclude the simple swap out memory model option.

jbrockmendel · 2021-05-12T16:01:17Z

So not keen to do precursor to the precursors, so can maybe leave out the other cleanups and just parameterise on dtype?

that would have the same effect vis-a-vis making review easier

…41419)

[ArrowStringArray] TST: parametrize str.extractall tests

a6731b1

simonjayhawkins added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels May 11, 2021

simonjayhawkins added this to the 1.3 milestone May 11, 2021

jbrockmendel merged commit ff134cc into pandas-dev:master May 11, 2021

simonjayhawkins deleted the extractall-tests branch May 12, 2021 08:52

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

[ArrowStringArray] TST: parametrize str.extractall tests (pandas-dev#…

b810b45

…41419)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ArrowStringArray] TST: parametrize str.extractall tests #41419

[ArrowStringArray] TST: parametrize str.extractall tests #41419

simonjayhawkins commented May 11, 2021

jbrockmendel commented May 11, 2021

jbrockmendel commented May 11, 2021

simonjayhawkins commented May 12, 2021

simonjayhawkins commented May 12, 2021

jbrockmendel commented May 12, 2021

[ArrowStringArray] TST: parametrize str.extractall tests #41419

[ArrowStringArray] TST: parametrize str.extractall tests #41419

Conversation

simonjayhawkins commented May 11, 2021

jbrockmendel commented May 11, 2021

jbrockmendel commented May 11, 2021

simonjayhawkins commented May 12, 2021

simonjayhawkins commented May 12, 2021

jbrockmendel commented May 12, 2021