BUG: DataFrame.insert()
fails to insert a 2D python list when pandas doesn't
#5531
Closed
3 tasks done
Labels
bug 🦗
Something isn't working
P1
Important tasks that we should complete soon
pandas concordance 🐼
Functionality that does not match pandas
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
It appears that pandas always treat a python list as a 1D object (even if it's naturally a 2D one), thus allowing such objects to be inserted as a column. On the other hand, Modin does that pretty optimization (#5226) that converts a list-like value to insert to numpy, speeding up the column deserialization significantly.
modin/modin/core/storage_formats/pandas/query_compiler.py
Lines 2368 to 2369 in 8b40350
This acts badly in cases when
value
is a 2D python list (remember, pandas see this as a 1D object) asnp.array(value)
will convert the value into a literal 2D matrix, thus causing an error on insertion because pandas now can see that the value is 2D and aborts the insertion.A simple workaround could be is to manually create a proper 1D NumPy array and then insert it into the frame, this will avoid this mistaken conversion.
Expected Behavior
To work
Error Logs
Installed Versions
Replace this line with the output of pd.show_versions()
The text was updated successfully, but these errors were encountered: