Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Backport of string changes for 2.3 release #59513

Merged
merged 52 commits into from
Oct 9, 2024

Commits on Oct 3, 2024

  1. PDEP-14: Dedicated string data type for pandas 3.0 (pandas-dev#58551)

    Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
    Co-authored-by: Irv Lustig <irv@princeton.com>
    Co-authored-by: William Ayd <william.ayd@icloud.com>
    Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
    Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
    6 people committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    b1b8eed View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5778049 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. Configuration menu
    Copy the full SHA
    a494ed8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    06dbb7a View commit details
    Browse the repository at this point in the history
  3. String dtype: rename the storage options and add na_value keyword i…

    …n `StringDtype()` (pandas-dev#59330)
    
    * rename storage option and add na_value keyword
    
    * update init
    
    * fix propagating na_value to Array class + fix some tests
    
    * fix more tests
    
    * disallow pyarrow_numpy as option + fix more cases of checking storage to be pyarrow_numpy
    
    * restore pyarrow_numpy as option for now
    
    * linting
    
    * try fix typing
    
    * try fix typing
    
    * fix dtype equality to take into account the NaN vs NA
    
    * fix pickling of dtype
    
    * fix test_convert_dtypes
    
    * update expected result for dtype='string'
    
    * suppress typing error with _metadata attribute
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    925c21c View commit details
    Browse the repository at this point in the history
  4. TST (string dtype): xfail all currently failing tests with future.inf…

    …er_string (pandas-dev#59329)
    
    * TST (string dtype): xfail all currently failing tests with future.infer_string
    
    * more xfails
    
    * more xfails
    
    * add missing strict=False
    
    * also run slow and single cpu tests
    
    * fix single_cpu tests
    
    * xfail some slow tests
    
    * stop suppressing non-zero exit code from pytest on string CI build
    
    * remove accidentally added xlsx file
    
    ---------
    
    Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
    2 people authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    431b246 View commit details
    Browse the repository at this point in the history
  5. TST (string dtype): follow-up on pandas-devGH-59329 fixing new xfails (

    …pandas-dev#59352)
    
    * TST (string dtype): follow-up on pandas-devGH-59329 fixing new xfails
    
    * add missing strict
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    6882ef9 View commit details
    Browse the repository at this point in the history
  6. TST (string dtype): change any_string_dtype fixture to use actual dty…

    …pe instances (pandas-dev#59345)
    
    * TST (string dtype): change any_string_dtype fixture to use actual dtype instances
    
    * avoid pyarrow import error during test collection
    
    * fix dtype equality in case pyarrow is not installed
    
    * keep using mode.string_storage as default for NA variant + more xfails
    
    * fix test_series_string_inference_storage_definition
    
    * remove no longer necessary xfails
    
    ---------
    
    Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
    jorisvandenbossche and mroeschke committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    99ebd18 View commit details
    Browse the repository at this point in the history
  7. TST (string dtype): remove usage of arrow_string_storage fixture (pan…

    …das-dev#59368)
    
    * TST (string dtype): remove usage of arrow_string_storage fixture
    
    * fixup
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    1566042 View commit details
    Browse the repository at this point in the history
  8. TST (string dtype): replace string_storage fixture with explicit stor…

    …age/na_value keyword arguments for dtype creation (pandas-dev#59375)
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    1d77d0e View commit details
    Browse the repository at this point in the history
  9. String dtype: restrict options.mode.string_storage to python|pyarrow …

    …(remove pyarrow_numpy) (pandas-dev#59376)
    
    * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)
    
    * add type annotation
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    2465a6d View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    35ebe68 View commit details
    Browse the repository at this point in the history
  11. String dtype: implement object-dtype based StringArray variant with N…

    …umPy semantics (pandas-dev#58451)
    
    Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
    2 people authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    463fd91 View commit details
    Browse the repository at this point in the history
  12. REF (string dtype): de-duplicate _str_map methods (pandas-dev#59443)

    * REF: de-duplicate _str_map methods
    
    * mypy fixup
    WillAyd authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    397cb09 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    dd2680c View commit details
    Browse the repository at this point in the history
  14. String dtype: fix alignment sorting in case of python storage (pandas…

    …-dev#59448)
    
    * String dtype: fix alignment sorting in case of python storage
    
    * add test
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    a9fd6f1 View commit details
    Browse the repository at this point in the history
  15. TST (string dtype): add test build with future strings enabled withou…

    …t pyarrow (pandas-dev#59437)
    
    * TST (string dtype): add test build with future strings enabled without pyarrow
    
    * ensure the build doesn't override the default ones
    
    * uninstall -> remove
    
    * avoid jobs with same env being cancelled
    
    * use different python version for both future jobs
    
    * add some xfails
    
    * fixup xfails
    
    * less strict
    WillAyd authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    bf7fb01 View commit details
    Browse the repository at this point in the history
  16. REF (string dtype): de-duplicate _str_map (2) (pandas-dev#59451)

    * REF (string): de-duplicate _str_map (2)
    
    * mypy fixup
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    81850c8 View commit details
    Browse the repository at this point in the history
  17. REF (string): de-duplicate str_map_nan_semantics (pandas-dev#59464)

    REF: de-duplicate str_map_nan_semantics
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    078c5a0 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    fdbd473 View commit details
    Browse the repository at this point in the history
  19. String dtype: fix convert_dtypes() to convert NaN-string to NA-string (

    …pandas-dev#59470)
    
    * String dtype: fix convert_dtypes() to convert NaN-string to NA-string
    
    * fix CoW tracking for conversion to python storage strings
    
    * remove xfails
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    2346acf View commit details
    Browse the repository at this point in the history
  20. String dtype: honor mode.string_storage option (and change default to…

    … None) (pandas-dev#59488)
    
    * String dtype: honor mode.string_storage option (and change default to None)
    
    * fix test + explicitly test default
    
    * use 'auto' instead of None
    jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    1bd3ce8 View commit details
    Browse the repository at this point in the history
  21. BUG (string): ArrowEA comparisons with mismatched types (pandas-dev#5…

    …9505)
    
    * BUG: ArrowEA comparisons with mismatched types
    
    * move whatsnew
    
    * GH ref
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    7e50b16 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    fa14a19 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    036e9da View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    4d26bed View commit details
    Browse the repository at this point in the history
  25. REF (string): Move StringArrayNumpySemantics methods to base class (p…

    …andas-dev#59514)
    
    * REF (string): Move StringArrayNumpySemantics methods to base class
    
    * mypy fixup
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    31153c1 View commit details
    Browse the repository at this point in the history
  26. REF (string): remove _str_na_value (pandas-dev#59515)

    * REF (string): remove _str_na_value
    
    * mypy fixup
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    721bf1e View commit details
    Browse the repository at this point in the history
  27. REF (string): move ArrowStringArrayNumpySemantics methods to base cla…

    …ss (pandas-dev#59501)
    
    * REF: move ArrowStringArrayNumpySemantics methods to parent class
    
    * REF: move methods to ArrowStringArray
    
    * mypy fixup
    
    * Fix incorrect double-unpacking
    
    * move methods to subclass
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    ceee52d View commit details
    Browse the repository at this point in the history
  28. API (string): return str dtype for .dt methods, DatetimeIndex methods (

    …pandas-dev#59526)
    
    * API (string): return str dtype for .dt methods, DatetimeIndex methods
    
    * mypy fixup
    jbrockmendel authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    38f5b61 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    a35481f View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    172af49 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    7946df1 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    6909c47 View commit details
    Browse the repository at this point in the history
  33. Skip niche issue

    WillAyd authored and jorisvandenbossche committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    b70cd48 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    1718e4b View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    3467d26 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    9142e5e View commit details
    Browse the repository at this point in the history
  37. String dtype: still return nullable NA-variant in object inference (`…

    …maybe_converts_object`) if requested (pandas-dev#59487)
    
    * String dtype: maybe_converts_object give precedence to nullable dtype
    
    * update datetimelike input validation
    
    * update tests and remove xfails
    
    * explicitly test pd.array() behaviour (remove xfail)
    
    * fixup allow_2d
    
    * undo changes related to datetimelike input validation
    
    * fix test for str on current main
    
    ---------
    
    Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
    jorisvandenbossche and mroeschke committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    b61bd23 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    c3d3980 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    e3728c7 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    732aa90 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    66e26d1 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    e9806c1 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    db9aa77 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    b3257e7 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    cecef0e View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    4c0d118 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    fc6bd39 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    bae9be1 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    94b797d View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    a10c5c0 View commit details
    Browse the repository at this point in the history