-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow get_dummies to return SparseDataFrame #8823
Conversation
81bd50e
to
ebafb72
Compare
@artemyk can you rebase this. and you will need to compare vs Sparse (or not) in the tests. |
f04a698
to
c54bb97
Compare
0ad8e08
to
ba28f40
Compare
@jreback This should be ready once #8822 is merged. |
7496d5d
to
0c556df
Compare
ENH: Allow get_dummies to return sparse dataframe ENH: Allow get_dummies to return sparse dataframe Fix Fix Fixes Bug in order of columns Slight speed improvement get_dummies update Release notes update Remove convert dummies test
0c556df
to
7173395
Compare
@jreback Ready to merge? |
@@ -48,6 +48,7 @@ Enhancements | |||
df.drop(['A', 'X'], axis=1, errors='ignore') | |||
|
|||
- Allow conversion of values with dtype ``datetime64`` or ``timedelta64`` to strings using ``astype(str)`` (:issue:`9757`) | |||
- ``get_dummies`` function now accepts ``sparse`` keyword. If set to ``True``, the return DataFrame is sparse. (:issue:`8823`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put backticks around DataFrame.
say "the return DataFrame
is sparse (e.g. SparseDataFrame
)"
merged via 4673225 thanks! |
For dataframes with a large number of unique values,
get_dummies
can use enormous amounts of memory. This provides asparse
flag toget_dummies
which returns a much more memory-efficient structure.For example:
returns
Performance could probably be improved a lot.
Fails
pandas/tests/test_reshape.py:TestGetDummiesSparse.test_include_na
due to #8822 .