Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Update Series/DataFrame Constructor args to Enable dtype Forced Conversion on Creation #44117

Open
adamgranthendry opened this issue Oct 20, 2021 · 3 comments
Labels
Astype Constructors Series/DataFrame/Index/pd.array Constructors

Comments

@adamgranthendry
Copy link

adamgranthendry commented Oct 20, 2021

Users often know the data types they want to convert their columns to at creation.

  1. Can the pd.Series constructor be given an additional argument errors (default raise) to optionally convert all values to a specified type (pd.NA or np.nan if cannot cast) at creation?

e.g. Currently, the following raises a ValueError:

>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int)
ValueError: invalid literal for int() with base 10: 'a'

However, it would be nice to have the following capability:

>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int, errors="coerce")
>>> a
0    NaN
1    1
2    3
3    NaN
dtype: int32
  1. Extending to the pd.DataFrame constructor, can the dtype argument be altered to something like Union[pd._typing.Dtype, Dict[str, pd._typing.Dtype] | None such that the user could pass in a dictionaries of columns as strings with dtype values they want to convert them to?
    (Again, the an argument errors (default raise) should be added so ValueErrors are still raised unless the user explicitly sets errors=coerce)

Currently, it is common that users must get each DataFrame column to convert manually after it is created, convert its dtype with a to_..() method using errors=coerce, and reassign back to the DataFrame column (since the to_...() methods have no inplace argument).

This feature would combine the functionality common to the other to_...() methods in one place. The method read_excel, for example, has an argument converters, which has the desired behavior sought after in this feature request when creating pd.DataFrame objects.

(ASIDE: It should be noted DataFrame.convert_dtypes doesn't have coerce functionality, so the Series a in the above example would simply be converted to type object)

@adamgranthendry adamgranthendry changed the title Feature Request: DataFrame constructor dtypes dict and coerce bool args Feature Request: Update Series/DataFrame Constructor args to Enable Column dtype Forced Conversion on Creation Oct 20, 2021
@adamgranthendry adamgranthendry changed the title Feature Request: Update Series/DataFrame Constructor args to Enable Column dtype Forced Conversion on Creation ENH: Update Series/DataFrame Constructor args to Enable Column dtype Forced Conversion on Creation Oct 20, 2021
@jbrockmendel
Copy link
Member

potential keyword for astype xref #22384

@adamgranthendry adamgranthendry changed the title ENH: Update Series/DataFrame Constructor args to Enable Column dtype Forced Conversion on Creation ENH: Update Series/DataFrame Constructor args to Enable dtype Forced Conversion on Creation Oct 20, 2021
@mroeschke
Copy link
Member

Since this request overlaps with the scope of #22384, closing this issue in favor of that to centralize the discussion there

@jorisvandenbossche
Copy link
Member

Extra keywords are somewhat out of scope for the discussion in #22384, I would say (the discussion is already broad, so whatever we can leave out, the better :)). So reopening this.

@jorisvandenbossche jorisvandenbossche added Astype Constructors Series/DataFrame/Index/pd.array Constructors labels Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Constructors Series/DataFrame/Index/pd.array Constructors
Projects
None yet
Development

No branches or pull requests

4 participants