BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting) #12337

dhomeier · 2016-02-15T23:48:58Z

Getting this error if (and I think only if) LANG is not defined or not set to any utf8-conforming value on 0.18.0rc1 (Mac OS X 10.10, python 3.4.4, numpy 0.11.0b3):

ERROR: test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/scratch.noindex/fink.build/pandas-py34-0.18.0rc1-1/pandas-0.18.0rc1/pandas/tests/test_format.py", line 2614, in test_to_latex_filename
    df.to_latex(path)
  File "/scratch.noindex/fink.build/pandas-py34-0.18.0rc1-1/pandas-0.18.0rc1/pandas/core/frame.py", line 1593, in to_latex
    encoding=encoding)
  File "/scratch.noindex/fink.build/pandas-py34-0.18.0rc1-1/pandas-0.18.0rc1/pandas/core/format.py", line 641, in to_latex
    latex_renderer.write_result(f)
  File "/scratch.noindex/fink.build/pandas-py34-0.18.0rc1-1/pandas-0.18.0rc1/pandas/core/format.py", line 877, in write_result
    buf.write(' & '.join(crow))
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 7: ordinal not in range(128)

Don't know about the inner workings of this test; if there are any open files involved, this might be relevant: scipy/scipy#5694

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-16T00:51:42Z

hmm I can repro this, but only on mac (usin 3.5 and latest numpy but I don't think numpy matters)

cc @nbonnotte

jreback · 2016-02-16T01:46:29Z

@dhomeier if you want to experiment on this one. not really sure what the issue is.

dhomeier · 2016-02-16T02:00:33Z

Seems to me this:
test_format.py:2611

        # test with utf-8 without encoding option
        if compat.PY3:  # python3 default encoding is utf-8

I believe this is not correct; the default encoding is whatever is specified by the LANG (or possibly one of the LC_*) environment variable. If that's not set it falls back to 'ascii'. That's at least what the docs for the builtin open() state:

    In text mode, if encoding is not specified the encoding used is platform
    dependent: locale.getpreferredencoding(False) is called to get the
    current locale encoding.

nbonnotte · 2016-02-16T07:32:59Z

Uh, I did write that part, but what I meant was "the default for pandas in a python3 environment is utf8", not "the default in python3 is utf8". This was to be consistant with to_csv:

encoding : string, optional
A string representing the encoding to use in the output file, defaults to ‘ascii’ on Python 2 and ‘utf-8’ on Python 3.

It would be interesting to compare to_latex and to_csv, because I don't see any reason why it should work in one case and not in another. I may have missed something, I'll have a look.

nbonnotte · 2016-02-16T10:34:37Z

Uh, I can't repro this, even though I'm on Mac OS X 10.10, python 3.5.1, numpy 1.10.4, and my LANG is empty.

dhomeier · 2016-02-16T12:25:28Z

So is the default supposed to be initialised somewhere in the pandas setup independently of the system settings? Otherwise, do you have LC_ALL set? What do you get for this?

>>> import locale
>>> locale.getdefaultlocale()
(None, None)
>>> locale.getpreferredencoding(False)
'US-ASCII'

jreback · 2016-02-16T13:48:46Z

This gets the pandas encoding.

In [1]: pd.get_option('display.encoding')
Out[1]: 'UTF-8'

I think this is correct in this case, rather its the comparison that's the issue. you need to do like its above

           with codecs.open(path, 'r', encoding='utf-8') as f:
                self.assertEqual(df.to_latex(), f.read())

nbonnotte · 2016-02-16T14:27:24Z

Hum, I think I did overlook the default behaviour in Python 3 as pointed out by @dhomeier

See this line: the parameter encoding can be None, in which case we go for the default python behavior (which depends on the locale), but both the documentation and the tests expect the default pandas behavior (i.e. UTF-8).

A simple solution could be to replace encoding=None with either 'ascii' or 'utf-8', depending on the version of python being used. Although I don't like much to hardcode it like that way...

dhomeier · 2016-02-16T14:32:11Z

AFAICS to_csv ultimately gets the default from UnicodeWriter, which sets the default encoding to "utf-8" regardless of the Python version.
I thought to_latex would get it from codecs.open(), which is supposed to use sys.getdefaultencoding().
But this indeed returns 'ascii' in Python 2.7, and 'utf-8' in Python 3, regardless of the environment setting. Just added this check to make sure:

        # test with utf-8 without encoding option
        if compat.PY3:  # python3 default encoding is utf-8
            self.assertEqual(sys.getdefaultencoding(), 'utf-8')
            with tm.ensure_clean('test.tex') as path:
                df.to_latex(path)
                with codecs.open(path, 'r') as f:
                    self.assertEqual(df.to_latex(), f.read())

It's still throwing the error in

            with codecs.open(self.buf, 'w', encoding=encoding) as f:
                latex_renderer.write_result(f)

so maybe the codecs.open() encoding is not passed on to LatexFormatter.write_result(f) (should it?)...

dhomeier · 2016-02-16T14:41:11Z

@nbonnotte, you could perhaps explicitly call sys.getdefaultencoding() in to_latex() if encoding is None. Though I am wondering now if it makes sense to allow non-ASCII characters in LaTeX output; might still depend on your TeX installation if they are accepted? But for consistency with to_csv one should perhaps use the same as there (or have both resort to sys.getdefaultencoding()).

nbonnotte · 2016-02-16T14:49:52Z

@dhomeier Of course we want non-ascii characters in LaTeX, they're handled by the inputenc package (or directly by XeLaTeX).

Explicitly calling sys.getdefaultencoding() seems a good idea to me, if that's not what codecs.open looks for implicitly. Can you do a PR?

dhomeier · 2016-02-16T15:27:51Z

Yes, I may have misread the codecs docstring; for codecs.open it does not explicitly state any default for the encoding. Should we use sys.getdefaultencoding() then or pd.get_option('display.encoding') as put forth by @jreback? And for to_latex only or the same for to_csv? The latter would default to csv.writer with encoding=None, which again would accept unicode characters in Python 3, but not 2; but I don't see any tests for this.

jreback · 2016-02-16T15:36:11Z

ideally you could do something to make this fail on Travis (as it is currently). I suspect in one of the alternate encoding builds (where we ovrride the LOCALE), may need to set some other variable to get this to fail. That way you can test wether a fix works.

It may be that we need to set an alternate py3 build to use LOCALE (e.g. you can do this with the 3.4 slow build)

dhomeier · 2016-02-17T22:58:47Z

Not sure I understand what you intend - add tests (for to_latex and to_csv?) that will fail if no LOCALE is set in the environment?

yarikoptic · 2016-04-18T14:21:02Z

FWIW -- running into the same issue while building the package for 0.18.0-114-g6c692ae on debian sid. Will skip this test for now

0-wiz-0 · 2016-08-19T07:04:00Z

I see this too when running the tests on NetBSD with the default LC_ALL=C (which is ASCII) and python-3.5.2.

xref #12337 Author: Nicolas Bonnotte <nicolas.bonnotte@gmail.com> Closes #14114 from nbonnotte/unicode-to_latex-12337 and squashes the following commits: dadf73c [Nicolas Bonnotte] New tentative with C locale b876296 [Nicolas Bonnotte] Base matrix configuration c825f86 [Nicolas Bonnotte] New files requirements-3.5_ASCII.* 3b4c6a5 [Nicolas Bonnotte] Travis conf: new test with python 3.5 and LC_ALL=C 3b859ce [Nicolas Bonnotte] Test for Python 3.4 with C locale

mroeschke · 2020-04-04T20:36:50Z

Since we don't support Python versions less than 3.6.1 and the CI hasn't had issues with this test, I imagine this is no longer an issue. Happy to reopen if anyone else experiences issues.

jreback added Bug IO LaTeX to_latex labels Feb 16, 2016

jreback added this to the 0.18.0 milestone Feb 16, 2016

jreback modified the milestones: 0.18.1, 0.18.0 Feb 21, 2016

jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016

0-wiz-0 mentioned this issue Aug 19, 2016

TypeError: 'NoneType' object is not callable #14043

Closed

jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016

nbonnotte mentioned this issue Aug 28, 2016

Test for Python 3.5 with C locale #14114

Closed

4 tasks

jreback added the Unicode Unicode strings label Aug 31, 2016

jorisvandenbossche modified the milestones: 0.19.0, 0.20.0 Sep 12, 2016

jreback mentioned this issue Sep 21, 2016

TST: 3.5 c-locale #14275

Closed

jreback modified the milestones: 0.19.0, 0.19.1 Sep 28, 2016

jorisvandenbossche modified the milestones: 0.20.0, 0.19.1 Oct 22, 2016

jreback modified the milestones: 0.20.0, 0.21.0, Next Major Release Mar 23, 2017

jbrockmendel added the Unreliable Test Unit tests that occasionally fail label Dec 19, 2019

mroeschke closed this as completed Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting) #12337

BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting) #12337

dhomeier commented Feb 15, 2016

jreback commented Feb 16, 2016

jreback commented Feb 16, 2016

dhomeier commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

jreback commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

dhomeier commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

jreback commented Feb 16, 2016

dhomeier commented Feb 17, 2016

yarikoptic commented Apr 18, 2016

0-wiz-0 commented Aug 19, 2016

mroeschke commented Apr 4, 2020

BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting) #12337

BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting) #12337

Comments

dhomeier commented Feb 15, 2016

jreback commented Feb 16, 2016

jreback commented Feb 16, 2016

dhomeier commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

jreback commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

dhomeier commented Feb 16, 2016

nbonnotte commented Feb 16, 2016

dhomeier commented Feb 16, 2016

jreback commented Feb 16, 2016

dhomeier commented Feb 17, 2016

yarikoptic commented Apr 18, 2016

0-wiz-0 commented Aug 19, 2016

mroeschke commented Apr 4, 2020