API: value-dependent behaviour in concat with all-NA data #40893
Labels
API Design
Dtype Conversions
Unexpected or buggy dtype conversions
Needs Discussion
Requires discussion from core team before further action
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
In general, we want to get rid of value-dependent behaviour in concat-operations: the resulting dtype of a concat-operation only depends on the input dtypes, and not on the exact content (the exact values) of the inputs.
This has been discussed in the past on general occasions, eg in #33607 when adding the general EA interface for concat (there is still one value-dependent special case for Categorical involving integer categories / missing values, encoded in
core/dtypes/concat.py::cast_to_common_type
), or #39122 about this issue when concerning empty series/dataframes.But so one other case (which came up recently in eg #39574 and #39612) is related to all-NA/NaN objects.
For DataFrames, when there is all-missing column, its type gets ignored when determining the result dtype (which, however, requires inspecting the values of the column). Small example:
This can be useful, as you can get such object/float dtype columns depending on how those "empty" all-NaN DataFrames are created (eg when constructing a DataFrame with given index/column but without data, or by reindexing the rows of an actual empty DataFrame, or reindexing the columns of a non-empty DataFrame).
However, it does introduce annoying value-dependent behaviour, and is also not very consistent throughout pandas. For example, Series does not check for this, and will actually result in object dtype:
Further, this is also not consistent across data types. For example, we don't check for all-NA for the new nullable dtypes.
For ArrayManager, I didn't yet implement any special case value-dependent behaviour (#39612, so on this aspect it diverges from the BlockManager behaviour), as it would be good to first decide on the desired behaviour long term.
The text was updated successfully, but these errors were encountered: