-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow export of mixed columns to Stata strl #23692
Conversation
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Hello @bashtage! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23692 +/- ##
=======================================
Coverage 92.24% 92.24%
=======================================
Files 161 161
Lines 51318 51318
=======================================
Hits 47339 47339
Misses 3979 3979
Continue to review full report at Codecov.
|
does this break idempotency? (i think answer is no, this is round-trippable) |
I think yes because Stata doesn't have a string missing value. When the Stata file is read back in to Pandas, the This already happens with shorter strings with the Stata 114 writer. This PR allows the same to happen with strings longer than 245 characters in the 117 writer. >>> import pandas as pd
>>> df = pd.DataFrame({'a': ['abc', None]})
>>> df.to_stata('test.dta')
>>> pd.read_stata('test.dta')
|
'number': 1} | ||
] | ||
|
||
output = pd.DataFrame(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth having a test for other versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant for the other version (114) which doesn't support strls.
Yes. Essentially you get None->'' conversion in mixed columns since there is no missing value for strings in Stata. The only way to have idempotency would be to raise on mixed string columns so that None is not allowed in string columns. Then users would need to convert None to '' before saving, so that '' -> file -> ''. |
thanks @bashtage |
* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...
…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Enable export of large columns to Stata strls when the column
contains None as a null value
closes #23633
git diff upstream/master -u -- "*.py" | flake8 --diff