ENH: Allow export of mixed columns to Stata strl #23692

bashtage · 2018-11-14T12:26:42Z

Enable export of large columns to Stata strls when the column
contains None as a null value

closes #23633

closes StataWriter for version 117 fails on None in a string column long enough to be a Stata StrL. #23633
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633

pep8speaks · 2018-11-14T12:26:48Z

Hello @bashtage! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/io/stata.py !
There are no PEP8 issues in the file pandas/tests/io/test_stata.py !

codecov · 2018-11-14T13:03:59Z

Codecov Report

Merging #23692 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23692   +/-   ##
=======================================
  Coverage   92.24%   92.24%           
=======================================
  Files         161      161           
  Lines       51318    51318           
=======================================
  Hits        47339    47339           
  Misses       3979     3979

Flag	Coverage Δ
#multiple	`90.63% <ø> (ø)`	⬆️
#single	`42.31% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a197837...75f9d80. Read the comment docs.

jreback · 2018-11-14T13:32:57Z

does this break idempotency? (i think answer is no, this is round-trippable)

kylebarron · 2018-11-14T14:15:21Z

I think yes because Stata doesn't have a string missing value. When the Stata file is read back in to Pandas, the None values are ''.

This already happens with shorter strings with the Stata 114 writer. This PR allows the same to happen with strings longer than 245 characters in the 117 writer.

>>> import pandas as pd
>>> df = pd.DataFrame({'a': ['abc', None]})
>>> df.to_stata('test.dta')
>>> pd.read_stata('test.dta')

Out[7]:
   index    a
0      0  abc
1      1

jreback · 2018-11-14T14:17:04Z

pandas/tests/io/test_stata.py

+             'number': 1}
+        ]
+
+        output = pd.DataFrame(output)


is it worth having a test for other versions?

Not relevant for the other version (114) which doesn't support strls.

bashtage · 2018-11-14T14:41:48Z

does this break idempotency? (i think the answer is no, this is round-trippable)

Yes. Essentially you get None->'' conversion in mixed columns since there is no missing value for strings in Stata.

The only way to have idempotency would be to raise on mixed string columns so that None is not allowed in string columns. Then users would need to convert None to '' before saving, so that '' -> file -> ''.

jreback · 2018-11-14T17:06:48Z

thanks @bashtage

* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...

…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)

Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633

ENH: Allow export of mixed columns to Stata strl

75f9d80

Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633

bashtage mentioned this pull request Nov 14, 2018

StataWriter for version 117 fails on None in a string column long enough to be a Stata StrL. #23633

Closed

jreback added Enhancement IO Stata read_stata, to_stata labels Nov 14, 2018

jreback added this to the 0.24.0 milestone Nov 14, 2018

jreback mentioned this pull request Nov 14, 2018

df.to_stata fails when a column of type object contains only None #23572

Closed

jreback reviewed Nov 14, 2018

View reviewed changes

jreback merged commit fcb8403 into pandas-dev:master Nov 14, 2018

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)

b93de32

Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633

bashtage deleted the strl-none branch March 21, 2019 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow export of mixed columns to Stata strl #23692

ENH: Allow export of mixed columns to Stata strl #23692

bashtage commented Nov 14, 2018

pep8speaks commented Nov 14, 2018

codecov bot commented Nov 14, 2018 •

edited

Loading

jreback commented Nov 14, 2018

kylebarron commented Nov 14, 2018

jreback Nov 14, 2018

bashtage Nov 14, 2018

bashtage commented Nov 14, 2018

jreback commented Nov 14, 2018

ENH: Allow export of mixed columns to Stata strl #23692

ENH: Allow export of mixed columns to Stata strl #23692

Conversation

bashtage commented Nov 14, 2018

pep8speaks commented Nov 14, 2018

codecov bot commented Nov 14, 2018 • edited Loading

Codecov Report

jreback commented Nov 14, 2018

kylebarron commented Nov 14, 2018

jreback Nov 14, 2018

Choose a reason for hiding this comment

bashtage Nov 14, 2018

Choose a reason for hiding this comment

bashtage commented Nov 14, 2018

jreback commented Nov 14, 2018

codecov bot commented Nov 14, 2018 •

edited

Loading