Improve scaling of to_pandas #2814
Labels
P2
Minor bugs or low-priority feature requests
pandas 🤔
Weird Behaviors of Pandas
Performance 🚀
Performance related issues and pull requests.
There are a few issues with
to_pandas
poor scaling:pandas.concat()
which in turn does a lot of memory copying due to how its internal structure is managed.I wasn't able to find out a good sequence of incantations for Pandas to not copy blocks around during concatenation, so I believe that to improve performance we have to manually construct underlying Pandas block structure into a set of pre-allocated blocks (as we're solving a much simpler task than regular
pandas.concat()
- we know that we've split the dataframe perfectly when distributing, so there should not be any conflicts in column names or indices).The text was updated successfully, but these errors were encountered: