Conform Series.to_csv to DataFrame.to_csv #19715

dahlbaek · 2018-02-15T14:11:03Z

Thank you for this awesome package!

Problem description

When indexing/selecting/slicing a DataFrame, pandas may return a Series. Both classes support the to_csv method, but the methods have subtle differences. These subtle differences may break your code, or demand a call to to_frame. Perhaps it would be beneficial to conform the Series.to_csv method to the DataFrame.to_csv method?

At least, the following two examples seem relevant (and are the reason I came here):

DataFrame.to_csv accepts the keyword parameter path_or_buf. This seems to be completely analogous to the differently named keyword parameter path of Series.to_csv.
DataFrame.to_csv accepts the keyword parameter line_terminator. There does not seem to be an analogous keyword parameter of Series.to_csv.

Addendum: Other differences

The following two keyword parameters are implemented for DataFrame.to_csv, but not for Series.to_csv: compression, chunksize. I suppose these are mostly relevant if you have very large dataframes, but I do not see any reason why they should not have analogous methods for Series.

Some of the keyword parameters of DataFrame.to_csv are specific to multi-column data and may therefore not make immediate sense for Series.to_csv. On the other hand, a user may wish to use quoting in a single-column file in order to maintain conformity with other parts of a bigger project. In this relation, the following keyword parameters are implemented for DataFrame.to_csv but do not for Series.to_csv: quoting, quotechar,doublequote, escapechar.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-02-15T14:17:08Z

@dahlbaek Thanks for opening this issue. I think this all sounds very sensible, and, should also not be too hard to solve.
As currently Series.to_csv just creates a dataframe and then does DataFrame.to_csv:

pandas/pandas/core/series.py

Lines 2924 to 2932 in 2fdf1e2

    
           df = DataFrame(self) 
        
           # result is only a string if no path provided, otherwise None 
        
           result = df.to_csv(path, index=index, sep=sep, na_rep=na_rep, 
        
                              float_format=float_format, header=header, 
        
                              index_label=index_label, mode=mode, 
        
                              encoding=encoding, compression=compression, 
        
                              date_format=date_format, decimal=decimal) 
        
           if path is None: 
        
               return result

So I think we could simply pass through **kwargs to DataFrame.to_csv

dahlbaek · 2018-02-15T14:37:31Z

I see! If you would like, I would be happy to try my hand at a pull request.

jorisvandenbossche · 2018-02-15T17:51:35Z

Sure!

gfyoung · 2018-02-15T19:54:28Z

xref #18958 : I 100% agree that this should be done.

gfyoung · 2018-07-13T20:29:18Z

@jreback @jorisvandenbossche : Three PR's for resolving this issue:

#19745 (@dahlbaek)
#21868 (@gfyoung)
#21896 (@toobaz)

All of them have different implementations. Thoughts?

closes pandas-dev#19715

dahlbaek · 2018-07-14T10:10:14Z

As far as I understand, the solution by @gfyoung (#21868) takes care of everything except the reordering of positional arguments. As pointed out by @toobaz here (see #21896 for proof of concept), the ordering problem can be solved by noting that index should be True or False while sep must be a one-character string. That is, one can determine whether the user is using the old or new signature by checking the type of sep (which is the first positional argument following path_or_buf in the new signature).

However, as of right now, index also accepts one-character strings (evaluating to True), which means there may be code out there passing one-character strings to index. I wonder if it would be possible to pave the way for the solution to this problem by first having a release which forces users to pass in boolean valued arguments to index?

If one were to first implement something like

if not isinstance(index, bool):
    raise ValueError(
        "{0} is not a valid value for index. "
        "Use True, False or bool({0}) instead."
        .format(repr(index))
    )

in Series.to_csv, then the solution of @toobaz would be ensured to correctly classify which signature the user had in mind. But maybe this is overthinking it.

toobaz · 2018-07-14T10:14:56Z

then the solution of @toobaz would be ensured to correctly classify which signature the user had in mind

Doesn't it already? (see last rebase, and my reply to your comment)

toobaz · 2018-07-14T10:19:14Z

Doesn't it already? (see last rebase, and my reply to your comment)

No, it does not, sorry, the ambiguity is unavoidable.

If one were to first implement something like

No, I really don't think we want to do this - not worth the effort (the "y" behavior is undocumented).

closes pandas-dev#19715

closes #19715

closes pandas-dev#19715

jorisvandenbossche added the IO CSV read_csv, to_csv label Feb 15, 2018

gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Feb 15, 2018

dahlbaek mentioned this issue Feb 18, 2018

Conform Series.to_csv to DataFrame.to_csv #19745

Closed

4 tasks

jreback added this to the 0.23.0 milestone Feb 18, 2018

jorisvandenbossche modified the milestones: 0.23.0, Next Major Release Mar 29, 2018

toobaz added a commit to toobaz/pandas that referenced this issue Jul 13, 2018

Proof of concept for pandas-dev#19715 based on pandas-dev#21868

3ea765a

toobaz added a commit to toobaz/pandas that referenced this issue Jul 13, 2018

Proof of concept for pandas-dev#19715 based on pandas-dev#21868

1fa5123

toobaz mentioned this issue Jul 13, 2018

DEPR: Deprecate Series.to_csv signature #21896

Merged

4 tasks

toobaz added a commit to toobaz/pandas that referenced this issue Jul 13, 2018

API: Proof of concept for pandas-dev#19715 based on pandas-dev#21868

5175907

closes pandas-dev#19715

toobaz added a commit to toobaz/pandas that referenced this issue Jul 14, 2018

API: Proof of concept for pandas-dev#19715 based on pandas-dev#21868

7d655b2

closes pandas-dev#19715

toobaz added a commit to toobaz/pandas that referenced this issue Jul 14, 2018

API: Proof of concept for pandas-dev#19715 based on pandas-dev#21868

446cd2c

closes pandas-dev#19715

toobaz added a commit to toobaz/pandas that referenced this issue Jul 25, 2018

API: Deprecate old Series.to_csv signature

191d2ab

closes pandas-dev#19715

dhimmel mentioned this issue Jul 27, 2018

Default to_* methods to compression='infer' #22011

Merged

4 tasks

toobaz added a commit to toobaz/pandas that referenced this issue Aug 2, 2018

API: Deprecate old Series.to_csv signature

e81e147

closes pandas-dev#19715

toobaz added a commit to toobaz/pandas that referenced this issue Aug 2, 2018

API: Deprecate old Series.to_csv signature

e69c5ca

closes pandas-dev#19715

jorisvandenbossche closed this as completed in #21896 Aug 13, 2018

jorisvandenbossche pushed a commit that referenced this issue Aug 13, 2018

API: Deprecate old Series.to_csv signature (#21896)

eb0ac54

closes #19715

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

API: Deprecate old Series.to_csv signature (pandas-dev#21896)

6a46d08

closes pandas-dev#19715

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conform Series.to_csv to DataFrame.to_csv #19715

Conform Series.to_csv to DataFrame.to_csv #19715

dahlbaek commented Feb 15, 2018

INSTALLED VERSIONS

jorisvandenbossche commented Feb 15, 2018 •

edited

Loading

dahlbaek commented Feb 15, 2018

jorisvandenbossche commented Feb 15, 2018

gfyoung commented Feb 15, 2018

gfyoung commented Jul 13, 2018

dahlbaek commented Jul 14, 2018 •

edited

Loading

toobaz commented Jul 14, 2018

toobaz commented Jul 14, 2018 •

edited

Loading

Conform Series.to_csv to DataFrame.to_csv #19715

Conform Series.to_csv to DataFrame.to_csv #19715

Comments

dahlbaek commented Feb 15, 2018

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Feb 15, 2018 • edited Loading

dahlbaek commented Feb 15, 2018

jorisvandenbossche commented Feb 15, 2018

gfyoung commented Feb 15, 2018

gfyoung commented Jul 13, 2018

dahlbaek commented Jul 14, 2018 • edited Loading

toobaz commented Jul 14, 2018

toobaz commented Jul 14, 2018 • edited Loading

Output of `pd.show_versions()`

jorisvandenbossche commented Feb 15, 2018 •

edited

Loading

dahlbaek commented Jul 14, 2018 •

edited

Loading

toobaz commented Jul 14, 2018 •

edited

Loading