-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add categorical support for Stata export #8767
Conversation
Some questions about categoricals:
Could also probably use some feedback to simplify the Python2/3 bytes/strings code. A few small things left to do:
|
|
self.off = [] | ||
for vl in self.value_labels: | ||
category = vl[1] | ||
if not isinstance(category, string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can stringify if non-string?
1873f71
to
08ad0a6
Compare
@jreback Should be pretty much ready. In the end I decided to stringify categoricals and provide a warning to check the Stata file. |
:class:`~pandas.io.stata.StataWriter`` and | ||
:func:`~pandas.core.frame.DataFrame.to_stata` only support fixed width | ||
strings containing up to 244 characters, a limitation imposed by the version | ||
115 dta file format. Attempting to write *Stata* dta files with strings | ||
longer than 244 characters raises a ``ValueError``. | ||
|
||
.. warning:: | ||
|
||
*Stata* data files only support text labels for categroical data. Exporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "categroical"
08ad0a6
to
6d0e8cc
Compare
null_byte = b'\x00' | ||
# len | ||
bio.write(struct.pack(byteorder + 'i', self.len)) | ||
# labname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put a blank line before each 'block' (e.g. have a blank line, comment line, then code)
@bashtage otherwise looks good. ping when pushed and green. |
6d0e8cc
to
47d3d7e
Compare
@jreback ready |
47d3d7e
to
ffa73b9
Compare
"""Check for categorigal columns, retain categorical information for | ||
Stata file and convert categorical data to int""" | ||
|
||
is_cat = [True if com.is_categorical_dtype(data[col]) else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: [ com.is_categorical_dtype(data[col]) for col in data ]
. The True/False are superfluous
minor comment. ping when green. |
ffa73b9
to
3a713fc
Compare
@jreback Should be ready |
original = pd.concat([original[col].astype('category') for col in original], axis=1) | ||
|
||
with tm.ensure_clean() as path: | ||
original.to_stata(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look in detail, but shouldn't we also test that the file is written correctly? (by reading it in again and checking to original?)
@jorisvandenbossche Good idea - I had only been checking that the files are correct in Stata, which is probably more important but cannot be automated. This showed there is a bug in the reader code that doesn't correctly handle missing values. In general writing and reading it back in isn't that useful since both the writer and reader can agree but still be incorrect (this happened in the past). So @jreback hold off on this one for a while until I can fix the reader. |
ping when ready |
3a713fc
to
3b4787b
Compare
Add support for exporting DataFrames containing categorical data. closes pandas-dev#8633 xref pandas-dev#7621
3b4787b
to
204b50e
Compare
@jreback ready |
ENH: Add categorical support for Stata export
thanks! this is excellent! |
Add support for exporting DataFrames containing categorical data.
closes #8633
xref #7621