Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding Error when reading license file. #35

Closed
harahu opened this issue Jun 7, 2019 · 5 comments
Closed

Decoding Error when reading license file. #35

harahu opened this issue Jun 7, 2019 · 5 comments

Comments

@harahu
Copy link

harahu commented Jun 7, 2019

I ran pip-licenses inside a docker container of mine, and got the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/pip-licenses", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 598, in main
    output_string = create_output_string(args)
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 386, in create_output_string
    table = create_licenses_table(args, output_fields)
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 200, in create_licenses_table
    for pkg in get_packages(args):
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 185, in get_packages
    pkg_info = get_pkg_info(pkg)
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 141, in get_pkg_info
    (license_file, license_text) = get_pkg_license_file(pkg)
  File "/usr/local/lib/python3.5/dist-packages/piplicenses.py", line 128, in get_pkg_license_file
    file_lines = license_file_handle.readlines()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 10: ordinal not in range(128)

The offending license file was:
/usr/local/lib/python3.5/dist-packages/Werkzeug-0.14.1.dist-info/LICENSE.txt

I was able to resolve the issue by editing line 127 in piplicenses.py:

with open(test_file, encoding='utf-8', errors='ignore') as license_file_handle:

Not sure always assuming utf-8 is a good idea, though.

@raimon49
Copy link
Owner

raimon49 commented Jun 8, 2019

Thank you for the report.

It did not reproduce in my environment.

$ pip install "Werkzeug==0.14.1"
$ pip install pip-licenses

$ pip-licenses -l --format=j
[
  {
    "License": "BSD",
    "LicenseFile": "/home/raimon49/.anyenv/envs/pyenv/versions/3.6.4/envs/venv-3.6.4-Werkzeug/lib/python3.6/site-packages/Werkzeug-0.14.1.dist-info/LICENSE.txt",
    "LicenseText": "Copyright \u00a9 2007 by the Pallets team.\n\nSome rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are\nmet:\n\n* Redistributions of source code must retain the above copyright notice,\n  this list of conditions and the following disclaimer.\n\n* Redistributions in binary form must reproduce the above copyright\n  notice, this list of conditions and the following disclaimer in the\n  documentation and/or other materials provided with the distribution.\n\n* Neither the name of the copyright holder nor the names of its\n  contributors may be used to endorse or promote products derived from\n  this software without specific prior written permission.\n\nTHIS SOFTWARE AND DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND\nCONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,\nBUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND\nFITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE\nCOPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,\nINCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT\nNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF\nUSE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON\nANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF\nTHIS SOFTWARE AND DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF\nSUCH DAMAGE.\n",
    "Name": "Werkzeug",
    "Version": "0.14.1"
  }
]

If you can reproduce this exception, I will incorporate your suggestion into a file open.

Is there any other information?

@harahu
Copy link
Author

harahu commented Jun 8, 2019

I did some more research. Looking at the documentation for Python's open builtin function, it mentions that

... in text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Where most out-of-the-box personal computers have sensible locale setting, often docker containers do not. Calling locale.getpreferredencoding(False) on my VM outside of my container results in: 'UTF-8', while within the container I get 'ANSI_X3.4-1968'. I suspect this is the reason for the problem. The issue being the copyright symbol in the first line of the LICENSE.txt file:

Copyright © 2007 by the Pallets team.

I replicated this problem outside of the container with this script, passing the file path as an argument:

import sys

with open(sys.argv[1], encoding='ANSI_X3.4-1968') as license_file:
    lines = license_file.readlines()
    for line in lines:
        print(line)

I am not sure what is the preferred solution to this. As suggested in my initial post, I think you want to either specify an encoding (like utf-8), or provide some error handling, through the errors argument. The different options have different consequences which I am unsure about how will affect the usability and stability of your package across different contexts it will be used.

@raimon49
Copy link
Owner

raimon49 commented Jun 9, 2019

Ah, I understand.

The environment variable LANG is set to C.UTF-8 in many Docker container images.The official Python image that I use as a base image is also set.

e.g.)

# execute env command in my docker container
/opt/piplicenses # env
HOSTNAME=e6b341a614a9
PYTHON_PIP_VERSION=19.0.3
SHLVL=1
HOME=/root
GPG_KEY=0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D
TERM=xterm
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LANG=C.UTF-8
PYTHON_VERSION=3.7.2
PWD=/opt/piplicenses

It is recommended to set locale and LANG in your Dockerfile.

Thank you for your detailed research. Can you solve this answer?

@harahu
Copy link
Author

harahu commented Jun 9, 2019

Mine is based on the official TensorFlow one, for reference. But yeah, I can close this.

@reactive-firewall
Copy link

Important

Starting with PEP 263 in 2001, UTF-8 replaces
ascii as the default, providing strong historical support for expecting UTF-8.
Further, after this issue was resolved, PEP-686
has standardized this in relation to the open function's default, making the fix here
no longer even necessary.
TL;DR My rational for commenting on a closed issue is: this issue is referenced in the
pip-licenses documentation, and the general confusion of the discussion here about
defaulting to UTF-8 as an encoding. I hope this helps someone in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants