Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding detection can lead to LookupError #2732

Closed
gilbsgilbs opened this issue Feb 13, 2018 · 3 comments · Fixed by #2733
Closed

Encoding detection can lead to LookupError #2732

gilbsgilbs opened this issue Feb 13, 2018 · 3 comments · Fixed by #2733
Labels
Milestone

Comments

@gilbsgilbs
Copy link
Contributor

gilbsgilbs commented Feb 13, 2018

Long story short

Some encoding detected by cchardet are unsupported by python (e.g. VISCII). This makes text() function raise a LookupError when such encoding is detected, even when errors parameter is set to 'ignore' (which I would have assumed to be safe).

Expected behaviour

Not sure about this. Maybe get_encoding() should codecs.lookup for the detected encoding to ensure it is known, and if it isn't, fallback to UTF-8. Or document properly that text() function might raise LookupError or that get_encoding() result is not safe to pass to text(encoding) or .decode(encoding) directly.

Actual behaviour

LookupError is thrown.

Steps to reproduce

async def test_text_viscii(loop, session):
    response = ClientResponse('get', URL('http://def-cl-resp.org'))
    response._post_init(loop, session)

    def side_effect(*args, **kwargs):
        fut = loop.create_future()
        body = b'''\
\x43\x68\xe6\x20\x51\x75\xaf\x63\x20\x6e\x67\xe6\x20\x6c\xe0\x20
\x68\xae\x20\x63\x68\xe6\x20\x76\x69\xaa\x74\x20\x74\x68\xaf\x6e
\x67\x20\x6e\x68\xa4\x74\x20\x63\x68\xed\x6e\x68\x20\x74\x68\xd1
\x63\x20\x68\x69\xae\x6e\x20\x6e\x61\x79\x20\x63\xfc\x61\x20\x74
\x69\xaa\x6e\x67\x20\x56\x69\xae\x74\x2c\x20\x73\xd8\x0a\x64\xf8
\x6e\x67\x20\x6b\xfd\x20\x74\xf1\x20\x4c\x61\x20\x54\x69\x6e\x68
\x2c\x20\x64\xf1\x61\x20\x74\x72\xea\x6e\x20\x63\xe1\x63\x20\x62
\xe4\x6e\x67\x20\x63\x68\xe6\x20\x63\xe1\x69\x20\x63\xfc\x61\x20
\x6e\x68\xf3\x6d\x20\x6e\x67\xf4\x6e\x20\x6e\x67\xe6\x20\x52\xf4
\x6d\x61\x6e\x2c\x5b\x31\x5d\x20\xf0\xa3\x63\x0a\x62\x69\xae\x74
\x20\x6c\xe0\x20\x62\xe4\x6e\x67\x20\x63\x68\xe6\x20\x63\xe1\x69
\x20\x42\xb0\x20\xd0\xe0\x6f\x20\x4e\x68\x61\x2c\x5b\x32\x5d\x20
\x76\xbe\x69\x20\x63\xe1\x63\x20\x64\xa4\x75\x20\x70\x68\xf8\x20
\x63\x68\xfc\x20\x79\xaa\x75\x20\x74\xd7\x20\x62\xe4\x6e\x67\x20
\x63\x68\xe6\x20\x63\xe1\x69\x20\x48\x79\x0a\x4c\xd5\x70\x2e\x0a'''
        fut.set_result(body)
        return fut

    response.headers = {
        'Content-Type': 'text/plain'}
    content = response.content = mock.Mock()
    content.read.side_effect = side_effect
    await response.text(errors='ignore')  # crash here

Your environment

  • aiohttp 3.0.1
  • client
@asvetlov
Copy link
Member

I think mentioning LookupError in documentation is just fine.
Fallback to UTF-8 is not reliable: it produces a garbage or even exception most likely.
Would you make a Pull Request?

gilbsgilbs added a commit to gilbsgilbs/aiohttp that referenced this issue Feb 13, 2018
Sometimes, cchardet might detect encodings that Python doesn't know. In
such cases, `.text()` function might raise a `LookupError`, and
`get_encoding` may return values that are unsafe to pass to `bytes.decode()`
or to `.text()` functions.

Closes aio-libs#2732
@gilbsgilbs
Copy link
Contributor Author

gilbsgilbs commented Feb 13, 2018

@asvetlov I agree, but that seems inconsistent with the current behavior. If the client passes charset=<some unknown encoding> in content type and cchardet can't detect an encoding, get_encoding will fallback to utf-8. It shouldn't.

asvetlov pushed a commit that referenced this issue Feb 13, 2018
Sometimes, cchardet might detect encodings that Python doesn't know. In
such cases, `.text()` function might raise a `LookupError`, and
`get_encoding` may return values that are unsafe to pass to `bytes.decode()`
or to `.text()` functions.

Closes #2732
@asvetlov asvetlov added this to the 3.1 milestone Feb 13, 2018
@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants