Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return type for lxml.etree.fromstring should be Optional (?) #63

Closed
DMRobertson opened this issue May 3, 2022 · 2 comments · Fixed by #64
Closed

return type for lxml.etree.fromstring should be Optional (?) #63

DMRobertson opened this issue May 3, 2022 · 2 comments · Fixed by #64

Comments

@DMRobertson
Copy link
Contributor

At least with lxml 4.8.0, I see the following:

$ python
Python 3.10.4 (main, Mar 25 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml.etree as etree
>>> parser = etree.HTMLParser(recover=True, encoding="utf-8")
>>> tree = etree.fromstring(b"", parser)
>>> tree
>>> tree is None
True

But the type annotation says we'll always return an _Element:

def fromstring(
text: _AnyStr, parser: XMLParser = ..., *, base_url: _AnyStr = ...
) -> _Element: ...

Aside: I'm passing an HTMLParser here instead of an XMLParser. The type stub for fromstring doesn't allow that either, but again I'm not sure if that's a mistake in the annotations.

For context, I came across this whilst trying to use the stubs to typecheck this snippet of Synapse.

@scoder
Copy link
Member

scoder commented May 3, 2022

The None return value is a rare artefact of using recover=True with no data, but given that it exists, I agreed on both points, thanks for bringing this up. Maybe you can send a PR for these changes? We might actually be lacking a test for something that passes a parser at all. seems worth adding.

DMRobertson added a commit to DMRobertson/lxml-stubs that referenced this issue May 3, 2022
`parser` seems like it can be an `HTMLParser. There's code in the wild
that does this without obviously being wrong. As far as I can tell from
the lxml Cython source, `parser` is a `_Baseparser`. I wasn't sure if
the `iterparse` class ought to be allowed here too. I erred on the side
of caution.

It's possible for this function to return None, if the parser has
`recover=True`. I couldn't find a good way to express this without
affecting every other call to `fromstring`. (Perhaps one could make the
Parser generic over some kind of `Recoverable` indicator type... but
that seemed like overkill)

Resolves lxml#63.
@DMRobertson
Copy link
Contributor Author

I've had a go at this in #64---though I'm not massively familiar with lxml's machinery. Hopefully it's a useful drive-by though.

We might actually be lacking a test for something that passes a parser at all.

For completeness: I interpreted this as a test in the stubs repository here, rather than a test in lxml proper.

scoder pushed a commit that referenced this issue Jul 26, 2022
…H-64)

`parser` can also be an HTMLParser, not just XMLParser.

It's possible for these functions to return None, if the parser has
`recover=True` or a custom parser target was provided.
Otherwise, they always return an `_Element`.

Closes #63.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants