Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for writing variable labels to Stata files #13631

Closed
wants to merge 1 commit into from

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Jul 12, 2016

Add support for writing variable labels
Fix documentation for to_stata
Clean up function name to improve readability

closes #13536
closes #13535

case the current time is used.
dataset_label : str
A label for the data set. Should be 80 characters or smaller.
variable_labels : dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note here about the variables needing to be latin-1 encodable? Also perhaps note that a ValueError is raised if not, or if any label is too long.

@TomAugspurger TomAugspurger added Enhancement IO Stata read_stata, to_stata labels Jul 12, 2016
@TomAugspurger TomAugspurger added this to the 0.19.0 milestone Jul 12, 2016
@TomAugspurger
Copy link
Contributor

Just the one minor note, plus a whatsnew entry (this can go in 0.19). Other than that this looks great assuming Travis passes.

@bashtage
Copy link
Contributor Author

It will fail on Python 2.7 I can see now. But should be easy enough to fix.

@codecov-io
Copy link

codecov-io commented Jul 12, 2016

Current coverage is 84.38%

Merging #13631 into master will increase coverage by <.01%

@@             master     #13631   diff @@
==========================================
  Files           142        142          
  Lines         51223      51235    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43223      43235    +12   
  Misses         8000       8000          
  Partials          0          0          

Powered by Codecov. Last updated by d7c028d...1e1e1bf

@@ -1113,6 +1113,56 @@ def test_read_chunks_columns(self):
tm.assert_frame_equal(from_frame, chunk, check_dtype=False)
pos += chunksize

def test_write_variable_labels(self):
original = pd.DataFrame({'a': [1, 2, 3, 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add issue number

@jreback
Copy link
Contributor

jreback commented Jul 13, 2016

is this forward compat? IOW, righting the labels should then be readable by older versions of pandas?

variable_labels : dict
Dictionary containing columns as keys and variable labels as
values. Each label must be 80 characters or smaller. Raises a
ValueError if a label is too long or contains characters that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add versionadded tag

@bashtage
Copy link
Contributor Author

is this forward compat? IOW, righting the labels should then be readable by older versions of pandas?

Should be -- the reader code always has to read these labels even if it doesn't do anything with them. One can read them using StataReader but not sure when the method to read the labels was added.

@bashtage
Copy link
Contributor Author

@TomAugspurger @jreback Sort of unrelated issue, but do have any idea why my Travis fails when running pandas test suite? My travis always fails even when the pull request into pandas pydata travis works fine.

Most recent example:

https://travis-ci.org/bashtage/pandas/builds/144722347

I see lots of errors like:

======================================================================
ERROR: pandas.io.tests.test_pickle.TestPickle.test_pickles('0.18.1',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/site-packages/nose/case.py", line 382, in setUp
    try_run(self.inst, ('setup', 'setUp'))
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/home/travis/build/bashtage/pandas/pandas/io/tests/test_pickle.py", line 38, in setUp
    self.data = create_pickle_data()
  File "/home/travis/build/bashtage/pandas/pandas/io/tests/generate_legacy_storage_files.py", line 168, in create_pickle_data
    if _loose_version < '0.14.1':
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/distutils/version.py", line 52, in __lt__
    c = self._cmp(other)
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/distutils/version.py", line 337, in _cmp
    if self.version < other.version:
TypeError: unorderable types: str() < int()

@jreback
Copy link
Contributor

jreback commented Jul 14, 2016

@bashtage you have an older fork; git does not update tags with a fork (not really sure why this doesn't go along with cloning but it doesnt').

so

git push tags yourbranch master --tags

will push them up to your fork of master, so future branches should be ok
you can also do this on your current branch.

there is a setup section that shows the gittags (after you update then this will show them)

@bashtage bashtage changed the title ENH: Add support for writing variable labels ENH: Add support for writing variable labels to Stata files Jul 14, 2016
@bashtage
Copy link
Contributor Author

@jreback All green.

encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be ">", "<", "little", or "big". The default is None which uses
`sys.byteorder`
time_stamp : datetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where these new entries just left out before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Arguments were present but uncodumented. Mentioned in one fo the issues this closes.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2016

just some formatting changes. can you add a whatsnew entry as well. doc updates if you think are needed as well. ping on green.

Add support for writing variable labels
Fix documentation for to_stata
Clean up function name to improve readability

closes pandas-dev#13536
closes pandas-dev#13535
@bashtage
Copy link
Contributor Author

@jreback Should be ready

@jreback jreback closed this in fafef5d Jul 19, 2016
@jreback
Copy link
Contributor

jreback commented Jul 19, 2016

thanks @bashtage

@bashtage bashtage deleted the stata-data-labels branch January 24, 2017 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Support Stata variable labels Document pandas.DataFrame.to_stata data_label
5 participants